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ABSTBACTy 

Presented is a review and synthesis of research in 
science education conducted with students in grades 6 through^ 12 
during the years 196a-78. In the meta-analysis process , eighjt 
con^structs linked to learning outcotnes were used as the theoretical 
constructs for the^ literature selecMon. The constructs include 
quality and quantity of instruction: student Ability : motivation: age 
or developmen.tal level; and home^ Pf^er^ and classroom environments. 
•Included in .this repor-*^ r*re nine research-synthesis papers and' 
related appendices. Fi a papers are focused on the dependence of 
science learning on 'on^i or more of the constructs used in the study^ 
one summarizes the implications of this proiect for future research * 
svntheses and for conducting future primary empirical studies, while 
other papers acre concerned with sex differences in science learning 
and the effects of the science curriculum ef forts , after 1958. 
Appendices include a discussion of the significance of research 
synthesis, a computer codebock for characterizing the studies, and an 
interim report on the proiect. " (A utho,r/PBi 
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Introduction and Overview 
Herbert J. Walberg 
F. David Boulanger 

Barbara K. Kremer 

Geneva D. Haertel 

Thomas Weinstein 

College of Education 
University of Illinois at 
Chicago Circle 
^ Box 4348 
Chicago, Illinois 60680 
June, 1980 

This brief introduction and overview is intended to provide an overall 
perspective for the reader on the nine research-synthesis papers and related 
appen'^.ices that constitute the remainder of this report of research syntheses 
carried but under the support of the National Science Foundation. The nine 
papers are each self-contained to a large extent. Most are either in press 
or submitted to journals and thus contain independent statements of purpose, 
method, findings, and educational and-Tresearch implications. Only in this 
complete report, however, can prCe interested reader find all the papers in 
one document together witja.'^det ailed supportive material about the project 
that is unlikely to b^'^ublished in a journal but which, we hope, will save 
future researchers a great deal of time and effort in carrying out similar 
research syntheses in education. 
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A 1 



A 2 



As indicated in the Table of Contents , the sections of this report are 
identified by letter; and the pages are numbered within sections. Thus, for 
example/ this page is the first of the first section and is identified as A 2. 

Section B, "Science Education Research.," is a brief overview of the 
scope, purpose, method, results, and research recommendations of the pro- 
ject. It will appear along with 45 other research syntheses in a special 
1980 issue of Evaluation in Education; International Progress , edited by 
Herbert J. Walberg (principal investigator for this project) and Edward H. 
Haertel. In addition to more specific details on these points, the remain- 
ing papers also draw implications of the research syntheses for the improve- 
ment to science teaching and learning. 

Sections C and D concern the dependence of science learning on. student 
age, ability, and developmental level. T^ paper on ability has been accep- 
ted for publication in the Journal of Research in Science Teaching ; and the 
paper on age and development has been submitted to the same journal. 

Sections E, F, and G concern syntheses, instructional techniques, and 
classroom social environments in relation to science lecirning, The paper 
on social and psychological influences has been accepted for publication 
in Science Education ; and the paper on instruction will be published in the 
Journal of Research in Science Teaching . The paper on social environments 
has been submitted to Educational and Psychological Measurement s 

Section H gathers together the implications of the project for not 
only future research syntheses in science education but also for conducting 
future primary empirical studies. Tlie paper has been submitted to iihe journal 
Science Education. 




"A3. 

The remaining sections go beyond the original proposed scope of the 
project. The first, a preliminary synthesis of sex differences in science 
learning, served as the basis of a proposal for a full-scale research syn- 
thesis by Barbara Kremer and a collaborator at the University of Illinois 
at Urbana with the principal investigator of the present project as con- 
sultant. The last section is a full-scale research synthesis of the ef- 
fects of the large science curriculum efforts carried out after 1958. It 
has been submitted to the Review of Educational Research . 

The three appendices provide material that may be valuable to inves- 
tigators who plan research synthesis of research in science education. The 
first appendix discusses the potential and significance of research synthe- 
sis in education as well as prior efforts. The second appendix is the com- 
puter codebodk for chciracterizing the studies, which required a great deal 
of effort and group discussion. T^e final appendix contains the interim 
report on the project. 

In conclusion, we wish to acknowledge the support of the National Science 
Foundation, o\ir project officer Raymond J. Hannapel, and his colleagues 
Mary Budd Rowe and F. James Rutherford. We are also grateful to James Kulik 
and Wayne VJelch who served as consultants to the project. Perhaps it goes 
without saying, however, that errors and opinions in this report are strictly 
our own. 



Barbara K. Krcmcr, r. David Bouloncjor 
Geneva Ilacrtel, and Herbert J. Walberg 
University of Illinois at Chicago Circle 
College of Education, Box 4348 
Chicago, Illinois 60680 

This paper summarizes systematic syntheses, and integrative 
reviews of 15 years of science education research spanning the 
years 1964-78 conducted with students in grades 6 through 12. 
This project was initiated partly in response to recommendations 
of the NARST-NIE commission on Research in Science Education 
Report (Yager, 1978) that stated the need for more broadly based 
theoretical models, research reviews, and field studies to explain 
science learning. The selection of literature for this synthesis 
was guided by a psychological model of learning productivity 
(Walberg, 1980 ). This model identifies 'eight constructs that are 
linked to learning outcomes. The constructs are quality and quan- 
tity of instruction; student ability; motivation; age or develop- 
mental level; and home, peer, and classroom environments. 

The principal goals of the synthesis were to investigate the 
dependence of science learning on each of the eight constructs 
represented in the productivity model, to identify promising di- 
rections for science education research, and to provide policy 
makers with a comprehensive, quantitatively based guide to what. 



is known about the major facl'.orf^ .i.nflucnci nq science lenrnincj. 

The limiLation oU cjrade lcv,oJs 6-12 war; chtMicn so as Lo 
include the usual ranye of science course oCferincjs in the United 
States beginning with required science courses at grade 6 and con- 
cluding with the elective program o£ the high school. The age 
period represented by these grade levels ecompasses the onset of 
formal operational thinking (Inhelder anci Piaget, 1958). The 
fifteen-year litor.iurc period was chosen since it rr)prcscnts time, 
of major curriculum reform in science education and a correspond- 
ing increase in th:\ Cjuality tuid qvwintiLy of published research, 

Five r.eviews based on the eight constructs in the productivity 
model were conducted by the authors (the social-psychological con- 
structs motivation, home, and peer-group environment were combined 
in one review). Features of the literature s<)mpled and major con- 
clusions are summarized in Table 1» In selccrtincj literature for 
these reviews, the authors examined studies published in referred 
journals, unpublished and unreferfbd research reports, and disser- 
tations. Because of the large volumc-of studies, dissertations 
were not searched for the quality of instruction construct; and 
selection was limited to studies with instructional variables rep- 
resented in five or more published studies within this construct 
area . 

All studies selected for this synthesis related some measure 
of a construct vairJ/iblQ in the mhuIoJ. Lo a f;c ionco-lea rninq outcome. 
Where statistical reporting was adequate, quantitative methods of 
research synthesis, involving effect sixes and correlations, were 
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applied (Glass, lOlH) . IVlicrc sImLj sLlcal rOjxirL i ncj was LnccMnplcLc 
and therefore precluded Lhe calc:u IrtL ion oT eCCccL sizes, research 
findincjs were synLhesizod usinq inculiCicd, (luanLl taLive licchniciues 
including box scores and visual disi^lays of data points. 

A Brief Summary of Results and R ecom men dations fo r Roso arch 

Table 1 shows that 922 summary numerical-data points such as 
correlations and effect sizes could be extracted from 151 published 
studies. A great number of research findings are available in some 
areas such as the social environment and quality of instruction, 
but science education research on other important constructs such as 
motivation, home, and peer environments is meager. 

The rcs^uits sunimarized in the table speak for themsplves, but 
several overall points seem worth noting here. The results support 
a' key notion of the productivity model--that learning is not a 
function of only one or a few major constructs, as assumed in much 
published research, but is consistently correlated and undoubtedly 
causally implicated with at least eight constructs. With/ the excep- 
tion of quantity of instruction, results for which are thin, the 
findings in science education generally coincide in sign, consistency, 
and magnitude with previouo synthesis conducted in other school sub- 
jects, particularly reading and mathematics; and indicate that all 
eight constructs require consideration in efforts to improve the 
productivity of academic learning. Current work is devoted to ex- 
tending the findings, making more CKplici I. comparisons of the pro- 
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ductivity of consLrucLs in rici.cnco iind olhcr subjccL.s, rnakincj 
final prcpcirtiLions (jC dcLailoci U'chnLc.il. reports Cor journal pub- 
lication, and writincj non- tcchn ica .1. arl.icb^s on t he synLhosis im- 
plications for policy and prac titioiicr audiences. The rest of this 
brief article summarizes methodological recommendations .• 

Future research' should include more consistent reporting pro- . 
cedures, more studies of construct areas slighted in science ed- 
ucation research, replication of consistent findings within con- 
struct areas, and implementing more rigorous design and sampling 
procedures. Study reports should routinely include means and stan- 
dard deviations of all experimental and comparison-group outcomes 
to make future quantitative syntheses possible, and more comprehen- 
sive. The generalizabili ty of individual studies as well as future 
syntheses of research stand to benefit from greater attention to 
the description of the populations represented by the sample. This 
description should at least include the occupational composition 
of the community, or SES; and whether the community is urban, 
suburban, or rural in character. It should also include a descrip- 
tion of the type of curriculum (whether academic, general, or voca- 
tional) in which sampled students are enrolled , and student ethni- 
city. 

The reliabilities of instruments measuring construct variables 
and science learning outcomes should be reported, including the 
reliability of treatment imi)lcinonLal i on in oxpcr imon l.al studies. 
Correlations have been observed to vary as a function of measure-' 
ment reliability. 
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Descriptions, of insLrumcnLs, iDcludincj types of items and content, 
should be included in reports- in order that judcjinonts of validity 
and learning domain will be facilitated. This point is especially 
important in the case of unpublished , and locally-developed instru- 
ments. 

Surveys of literature in motivation, home environment, and 
peer environment constructs reveal that science education research- 
ers have paid little attention to these variables. Nevertheless, the 
consistent, positive direction of findings observed in studies of 
these constructs makes a strong case for their consideration in 
future research. The consistency and parallelism of results observed, 
in studies of student motivation and home environment with previous 
work in general education suggests the need for further direct in- 
vestigation of these constructs as control or stratification factors 
in studies' of curriculum and instruction. 

Studies replicating major bindings in the quality of instruction, 
ability., and classroom environment construct areas are recommended. 
However, the replication of these results need not rigorously follow 
the details of previous studies. Instead, it is recommended that 
future studies employ more robust designs incorporating multiple 
outcomes, and independent variables representing diffprent con- 
struct areas. Experimental designs would be improved if such fac- 
tors as ability, motivation, and classroc^iu environment could be 
overtly partialed out and their central not bo assumed by random 
assignment. This approach would lead to a bettor accounting of 
the sources of variances in outcomes and lead to better prediction 
and control. 
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Construct (s) 
Reviewed 

Age/Developnient- 
al Level 



Authors 

Bt)ulanger 
and Krener 

(1980) 



Number of 
Studie s 

'27 



Ability 



Boulanger 
(19B01 



34 



Motivation, 
Home Environ- 
ment, and Peer 
Influence 



Kremer 
and 
Walberg 

(1980) 



Motiva- 
tion/5 

Honie En- • 

vironment/ 

13 

P€2r En- 
vironment/ 
5 



I&blel . 
Sciaice mucation P^search on 
Eight CcTEtruct Areas 

Source of 
A rticles 

ERIC science edu- 
cation bibliograph- 
ies and annual re- 
views, and articles 
published in the 
Journal of Research 
in Science Teaching 
and Science Education . 
Two recent disserta- 
tions were included. 



ERIC science edu- 
cation biblio- 
graphies and annual 
reviews, and ar- 
ticles published in 
the Journal of Re^ ' 
search Teaching and 
Science Education* 

Jo urnal of Research in 
Sc ience Teaching and 
Scienc e Educa tion were 
Tevio^G^ for 1964-1979, 
Sc hool Science and Mathe- 
ma tics, Journal of Educa- 
t ion al Psychology/ Devel- 
opmental Psychology and 
So ciology of education for 
1971-1977 were also searched 
ERIC and So cial Science Ci- 
ta t.ion Ind e:r"aa well as 
science education biblio- 
graphies and annual reviews 
were also consulted. 



Number of Data 
Points Summarized 

17 median correlfitiona 
21 median correlations 



67 median correlations 



Major Conclusions 

Age was found tb be a poor pre- 
dictor of conceptual outcomes or 
logical operations in science 
achievement, The mean within-grade ' 
correlation of developmental level 
with cognitive achievement was .45. 
Annual increments in cognitive 
achievement averaged 10 percentile 
points, and developmental level l-J 
percentile points, Interventions ■ 
to increase increments are reported 
under the quality of instruction 
construct. 

Relationship between ability and 
achievement is v^ry stable. Abil::y 
accounts for an average of 25\ o: 
,the variance in science learning. 
Ability measures are better predic- 
tors of cornitive achieveinent tha:. 
develoomental measures. 



Motivation-5 correlations The mean correlation for student 

motivation and science learning 
Home-environment-12 was .37. Higher correlations wera 
study-median correlations obtained with standardized scales 

than with specially constructed 
measures. 

Ten out of thirteen studies showed 
positive relationships between par- 
ental socio-economic status and 
science learning. The aean correla- 
tion was .25. Parent education- ar.i 
aspiration, and involvement in the 
child's science education yielded 
a correlation of . 36. 'dth achieve- 
ment. 



Peer environoent-S cor- 
relations 
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ERIC 
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Table 1 
(ContJ 



Construct (s) 
Reviewed 



Quality of 
Instruction 



Quantity 'of 
Instruction 



Authors 



Number of 
Studies 



Boulanger 
(1980) 



52 



Boulanger 3 
(1980) 



Source of 
Articles 



Number of Data 
Points Summarized 



Published articles 
found through ERIC 
science education 
bibliographies and 
annual reviews and 
articles, published 
in the Journal of 
Research in Science 
Teaching and Science 
Education, Disser- 
tations excluded. 

Same as Quality of 
Instruction, 



57 effect sizes 



4 effect sizes 



Major Conclusions 

No consistent trends were observed in 
the peer construct. Isolated positive 
effects were found in the few studied 
located, t^t most showed no effects. 

Percentile improvements in cognitive ' 
achievement due to interventions were: 
Preinsiructional strategies, 34; .train- 
ing in scientific thinlclng, 30; high 
structure verbal content over lower 
structure, 27; realism or concreteness 
in adjunct materials, 22. Indirect and 
inductive strategies showed no differ- 
ences compared to direct and deductive 
strategies. 



Amount of time spent on a given unit 
of material holds no significant over- 
all relationship to amount learned in 
the limited number of studies found. 
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Table 1 
(Cont.l 



Construct (8) 

Reviewed 

Social Environ- 
nent of the 
ClassrooQi 



Authors 

Haertel, 
Walbergi 
and 

Haertel 
(1979) 



Number of 
Studies 

12 



Source of 
Articles _ 

A search was made 
of 15 years, of the 
Dissertation Ab-^ 
stracts > Education 
jnjex^. Psychological 
Abstracts ! Social 
Science Citation 
. Index, and the annual 
research summaries 
sponsored by the 
National Association 
for Research in 
Science Teaching 
(196M97B). 



Number of Data 
Pft^nfcs Summarized 

734 correlations 
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Ability and Science Learning: 
A Quantitative Synthesis 
F. David Boulanger 
University of Illinois at Chicago Circle 



Rxinning head: Ability and Science Learning 



EKLC 
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Ability and Science Learning 

Abstract 

The quantitative relationship of measured ability to measured 
science learning was synthesized fron the report^ correlations in 
34 studies cn grade 6 throu^ 12 students over a 16 year period. The 
findings indicate a stable cQitral taidency and deviation of correla- 
tions across ability and cognitive learning outcome categories and 
across several study variables such as sample size. Reliability of 
measures had the greatest and only statistically significant inf lu- . 
ence on ability-cognitive outcome correlations. Ability was found to 
account for an average of 23 percent of the variance in science learn- 
ing. 
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Ability and Science Learning 
A Quantitative Synthesis 

Qie of the uncontested findings of educational research is the 
relationship between measures of ability and schcol learning.. In 1930, 
St. John could conclude, "Ihe intercorrelations of all the criteria of 
intelligence and edxjcational achievement are without excepticn positive . . . " 
(p. 141). A more recent large scale national survey, Project Talent 
(Flanagan, Davis, Dailey, Shaycoft. Qrr, Goldberg, and Neynnan^ 1964) , 
reaffirmed this general finding; all reported correlations between 
measures of ability and achievement were positive. In a review of re- 
search on cognitive characteristics that influence learning. Bloom (1976) 
reported universally positive relationships between past achievement or 
ability aid learning in several subjects areas. 

Although consistent in direction, past studies, whether large scale 
or stall, reported different estimates of the size of the ability-learning 
correlation. . A scan, of published research in science education reveals 
a wide variation in correlations of ability with scifence learning. It 
appears that the quality of different measures of ability or of science 
learning at different grade -levels and under different study conditions 
might account for seme of the variation in the reported correlations. 
Correlations may differ with such study conditions as sample size, sub- 
ject matter, abilil^ level of students and research design. 
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The purpose of this study is to review, analyze and synthesize 
published studies relating ability measures to science learning measures 
in order to establish the best estimates of such correlations under vari- 
ous stxidy conditions. The estimates provide science education researchers 
and practitioners sunmary statistics for conparing the ability factor with 
other factors influencing science learning. 

Of particular interest for future reviews and syntheses are the eight 
constructs eiumerated in Wfei.lb€rg's (1978) Productivity Model viiich draws on 
the goieral education enpirical literatxore and provisionally identifies 
the primaiy factors influencing general school learning. The constructs are': 
ability, motivation, and age or developmental level; quality and quantity of 
instruction; axd heme, peer, and classrocm social environments. The unique 
features of science instruction such as laboratories, the use of quantitative 
skills, and the cutiulative nature of the subject matter suggest that estimates 
df correlations of the constructs with general learning outcomes ma^ not be 
accurate for science learning. The present study provides an estimate of the 
ability influence on science learning with future studies providing estimates 
for the other constructs. Sixrh broadly based revia^s drawing on general educa- 
tion research findings to inform and augment science education research are 
nationally identified needs (Yager, 1978) . 
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Literature Search, Selection, and Coding 
To asseible a body of literature reflecting the best current science 
education research, yet sufficiently extendied in time to include the recent 
period of grcwth in curriculum development and research, the published re- 
seardi in science education over the 1963-1978 period was included in the 
literature search. Ihe search was further limited to studies conducted with 
subjects in grades 6 through 12 to include the pre-co liege science program 
from the grade that is typically the beginning of requir-ed, specially taught 
science courses through the elective senior hi^ courses taken by a minority 
of students. 

Ability was initially defined as any cognitive measure that predicts 
science learning. Using this definition, thirty-four published studies 
were identified rhat correlated one or more measures of ability or past 
achievement with a science learning outcome. Studies including ability 
measures as blocking variables or as covariates in ACCWA were excluded, un- 
less a zero-order correlation was r^)orted between ability and outcome measures. 
(Calculated estimates of r frcxti blocking factors were judged inaccurate) . 
The Appendix contains a bibliography of included studies. 

AH assembled studies were numerically coded according to the follow- 
ing study- variables : the type, source and reliablility of the ability and 
the outcome measures; tlie type of intervention; and the elapsed time betaveen 
measiures; grade level, ability level, and science subject area of the sanple; 
the ethnic, urban-rural, and SES character of the comrunity; the design of 

0 

the study, unit of analysis and methodological flaws; and reported correlations. 
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In totcil/ over forty stuicfy- variables were recorded for each study on pre- 
pared code sheets . An independent check by a second researcher of the re- 
liability of codii-ig routinely revealed about 90 percent agreemoit. Code- 
book/ codesheetS/ and raw data are available in the project final report 
(Walberg/ Boulanger/ Krsner/ & Haertel, in pr^Jaration) , 

The coding process yielded three ability categories and four learning 
outcone categories forming a 3 x 4 or 12 cell ability by cxitcane matrix. 
The three ability categories were general ability, prior achievement, and / 
quantitative-spacial reasoning. The 'four outcone categories were: factual, 
product, process, and attitudinal learning. Table 1 presaits the definition 
effid exairple measures for each of the ability and outcome categories. 

Insert Table 1 about here. 
Results and Analysis 

To insure the independence of each correlation in a given ability- 
outcone cell, each study's median correlation for a given cell was con^xited, 
reducing the original 207 raw correlations to 67 median correlations viiich 
were used throughout the analysis. Vhen ccmbining correlations, ordinary 
means and standard deviations were corputed following the arguments of Glass 
(1978) and en[pirical results of Ugiiroglu and VfeLLberg (in press) that z-trahs- 
formaticns make little difference in means when coirbining correlations in 
the range of values of correlations in this stu(fy. 

Cnly five of the 67 correlations related ability to attitudes. Based 
on two (.16 and .28) and thre^ (.30, .24, axxd .38) correlations respectively, 
the mean of study-median correlations of general ability and prior achievanent 
with attitudes are .22 and .31. The overall ability-attitixie mean correlation 
is .27 with a standard deviation of .07. Given the small number of correlaticns 
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no further ancd.ysis of the ability-attitixie relationship was attenpted. 

The three cognitive oatcomes' (factual, prodixrtt and process) were 
analyzed together. Table 2 shows the means and number of study-median 
correltaions in each of the nine ability-cognitive outcone cells along 
with marginals cdrbining correlations across categories. No correlaticns 
of quant itative-spacial ability with- process outcomes were found, which 
leacves that cell an:pty. One cell mean, prior achievenent with factual 
outcone, was based on only cne correlation; viiile the genersil ability with 
product oitcame cell contained the most, 16^ correlations. The range of 
the mean ability-outcome correlations across the eight cells (erpty cell 
excluded) was .41 (prior achievement with factual outcome) .to .53 (quan- 
titative- spacial ability with product outcome) . 

^ Insert Table 2 about her e 

A two-way analysis of variance was conducted to determine if the dif- 
ferences among categories were attributable to diance. Main effects 
(Ability: F = .46, p = .64; Outcome: F = .38, p = .69) and interacticns 
(F - .11, 'P = .95) were non-significant, leading to the simplifi::ation that 
all three ability categories were, within statistical error, eqaadly good 
predictors of any of the three cognitive outcomes. Connbining the 62 cor^ 
relations across all cells for the best oversill estimate of the ability- 
cognitive outcome correlation yielded a mean of .48 with a standard deviation 
of .15. 
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Since no statistically valid distinction could be made among the cor- 
relations ralating the various subcategories of ability and cognitive out- 
cane, the analysis of the influence of other study variables such as sanple 
size, subject matter, and study design was conducted on the entire 62 cor- 
relation data set. To determine if any study-variable systematically biased 
the reported ability-outcome correlations, the values of study variables were 
dicotcmized into approximately equal subgroups and the t-test applied to corpare 
each resulting subgroup pair. Study-variables viiose values were constant or 
nearly constant across studies (e.g. mixed sex of sanple, individual as unit 
of analysis) or were rarely reported (e.g. ettonic cat^sition, coreriunity 
type, SES) were drc^ped frcm this analysis since it was clear they would 
not be significantly associated with systannatic differences among the cor- 
relations. "Table 3 reports dicotcmized values and- t-test results of the 
variables included. 

Insert Table 3 about here" 
Tlie results in Table 3 indicate only one difference significant at the 
P< .05 level: cognitive outcome measures with reliabilities higher than .80 
yielded hi^er correlations with, ability than cognitive outcar»e measures 
with reliabilities less than .80. TWo vari^ables had differences at the 
p«,10 level: published (usually standardized) ability measures yielded 
higher correlaticns with cognitive outccme measures than locally pro- 
duced ability measures, and higher reliability (greater than .90) ability 
measures gave higher correlations with cognitive outccme measures than 
lower reliability Cless than .90) abiliiy measures. 
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In additicn to the t-tests • reported in Table 3 , correlations of ccntin- 
uous study-variables with associated ability-cognitive outcome correlations 
were catputed and are repcorted in Table 4 . Grade level and the reliabilites 
of the ability and outocme measures show positive relationships with abil- 
ity-cognitive cutcone correlations, t)ut only the reliability of the outcome 
measure reached significance (p< .05) 

, Insert Table 4 about here 

Discussion and Conclusions 

The five ability- attitude outccme correlations gave a mean of .27 with 
a standard deviation of .07, while the sixty-two ability-cognitive outccme 
correlations had a mean of .48 with staixiard deviation of .15. Clearly, 
ability predicts cognitive outcoties better than attitudinal outcones, a find- 
ing vduLch is not^surprising givei the cognitive character of ability measures. 

Regarding the ability-cognitive outccme correlations, the consistoicy 
in correlational means regardless of the ability or cognitive outccme cate- 
gory gave a solid estimate of the degree to which ability is associated with 
cognitive learning 'in grades 6 through 12. The .48 mean correlation trans- 
lates into 23 percent of the variance in cognitive learning accounted for by 
ability, ihe standard deviation c(^15 means t±at in 2 out of 3 studies, 
the variances in cognitive learning accounted for by ability was sonewhere ' 
betweai 11 and 40 percait. The stabiUty of the standard deviation and 
thus of this estimate of variance accounted for, is evident with examination 
of the SD columns in Table 3, where 19 of 20 SD's are in the range o£ ,13 
to .17 
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According to t-test (Table 3) and correlational (Table 4) results, only 
one study ,;variable had a significant (p< .05) effept on the size of the ability- 
cognitive cutcane correlation vMle other variables had a mirsor or no irrpact. 
The reliability of the outcone measure had the greatest^ impact, accounting for 
U percent (r = .33) of the variance in the ability-cognitive outcone correlation. 
This finding is probably related to the higher correlations associated with 
published outcone measures. The use of published ability measures of high 
reliability also raised the correlations, although statistical significance 
was not attained. Both of the above findings are in agreement with the well 
kncwn tendency for correlations to rise as the reliability of .measurement 
iirproves, e.g. Iverson and Walberg (1979) found the correlations of the hone 
enviroiment with school learning increased with the reliablility of the out- 
cone measure. The correction for attenuation formula (Thomdike and Hagen, 
1977) was developed to correct correlations for this effect. Study variables 
.^having no systematic intact on the ability-cognitive outcone correlations were 
sample size, subject matter/ group ability level, and time elapsing between 
measures. c 

The primary methodological flaw throughout the 34 studies was the use of 
convenience saitpling which is related to the primary reporting flaw of not 
sufficiently identifying the population under study. No study provided pec- 
ulation parameters of ethnic conposition, urban-rural cotniunity type, and , 
SES level along with evidence of randan sairpling of the peculation. With- 
out this information/ generalization of the findings from any individual study 
is greatly limited. If it is assumned/ however/ that there is randcntness of 
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selection of groups studied across the 34 studies synthesized, then the. re- 
ported .48 correlation is representative of the grade 6 to 12 population, 
almost exclusively however, in the Uhited States. This assumption might be , 
questioned given the location of institutions conducting educational research 
and the tendency of. researchers to choose convenient and accessable schools, 
often in university catinunities or under some kind of university influence. 

To cross-validate the .48 general estiinate of the ability-science cog- 
nitive outccme correlation found in this stiody. Educational Achievement in 
Relation to Intelligence (St. John, 1930) and the Project Talent study 
(Flanagan et al., 1964) were consulted to find if cat5>arable coirrelations 
had been reported. St. John identified eight studies containing 16 cor- 
relations' between intelligence test scores and teachers' marks in natJiiral 
science in secbndary and higher grades. Ihe mean correlation r^x^rted was 
• 46. 

Project Talent did not report an ability test score but did identify 
an a priori IQ cor^posite consisting of Reading Comprehension, 2tos tract Reason- 
ing, and I4athematics 1 test scores. Iha mean correlation of the IQ corposite 
with Physical Science and Biological Science test scores for grades 9 through 
12 was .51..' 

The two estimates of .46 and .51 are in excellent agreement with the 
finding of .48 in this study. The congruence of these estiinates is even 
stranger if reliabilities of measures are considered. It can be assumed that 
teachers* marks will have lower reliabilities than the average of putcone 
measures used in the 34 studies v*iich yielded the .48 correlation; ..whereas, 
Project Taleit measures r^X)rted reliabilities higher than this average. 
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. Triplications 

The tenet that ability and past learning are among the best predictors 
of future learning is well established among educational researchers and 
practictioners. What is less well established is the degree to vMch^this 
tenet is true for different subject areas under different study corditions. 
The estimates developed in this study should provide .^the researcher in science 
education with a guide for e'stSmating the influence of ability on science 
learning in untested populations^ as well as a norm for corparing new find- 
ings on the extent to viiich various factors influence learning. Educational 
practictioners will find the results of value in moderating their a priori 
judganents on placement of students in ability groups or raising or lowering 
expectations for individual students based solely on test scores. The results 
of this study highlight the fact that measured ability, on average, does not 
account for a great amount of variance in science learning. Several other 
* factors .are kncvm to influence learning and thus corpensate for ability dif- 
ferences. Idajor among these other factors are student motivation; the qiiality 
and quantity of instruction; and hcn^, peer and classroon social envir ontents . 
As irtproved estimates of the effects of these other factors on science learning 
become available, science education research and teaching practice can be di- 
rected at optimizing ' those influences most potent in iitproving science learn- 
ing, keying the less manipulable ability factor in proper perfective. 
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Ability and Outcome Categories and Measures 



Category 



Definition 



Example Measures 



.1 



General General, verbal, or subject matter 
Ability specific aptitude or ability 



Prior Past general or subject matter specific 

Achievement science achievemnt or knowledge 

Quantitative- Quantitative, mechanical or spacial ability 

Spacial or reasoning except where specifically based 

Ability .on Piagetian tasks or logical operations 

Factual Recognition or recall of specific informa- 

Learning tion, e.g., facts, names, definitions 



Product Requires generalization or application of. 

Learning concept (s) to new situations. May also 
include factual items as in standardized 
achievement tests. Not identify d in study 
report as process or factual outroirie 

Process Requires use of thought processes or logical 
Learning operations associated with scientific think- 
ing, e.g. hypothesizing, controlling variables, 
' Must be ideritified in study report as such a 

measure 



Lorge-Thorndike (Johnson, 1969) 

SAT Verbal (Wasik 1971) 

IQ from school records (Hardy, 1970) 

Gr, 9 math achievement (Rotlpan 1966) 
Nelson Biology Test (Schock, 1973) 
SRA Battery (Sheehan, 197?) 

ITED Quantitative (Benson S Howell, 1968) 
DAT Mechanical Reas. (Tanner 1969) 
NFER Spacial Test (Harjoribanks, 1978) 

Retention Test (HoUiday & Brunner, 1977) 
Environmental Info, Test (Hart, 1978) 
Biology Info. Test (Tamir fi Jungwirth, 1975) 

BSCS Comp. Final (Engen & Smith, 1968) 

ACS Chem. Exam (Jones, 1963) 

SCCT Science Comp.* (Raven fi Polanski, 1974) 



Controlling Variables (Bredderman, 1973) 

Watson-Glaser TCT (George, 1968) 

Science Process Inv. (Welch s Pella, 1968) 



Attitudinal Attitudes toward or interests in scientists. 
Learning science careers, science instruction 



Science Attitude Scale (Engai & Smith, 1968) 
Environmental Attitude (Hart, 1978) 
Inventory of Science Attitudes (Swan, 1966) 



\ 1 



Parentheses contain a study using this measure. Studies are listed in the appendix. 
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Table 2 
Ability - Cognitive Oatcane 
Mean Correlation Matrix 



Ability 




Cognitive Outcome 




Factual 


Product 


Process 


Cotibined 


General 


.46 (5) . 


.49 (16) 


.49 (13) 


.49 (34) 


Ability 






Prior 


.41 (1) 


.48 (11) 


.42 (7) 


.46 (19) 


Achievement 






• 

Quantitative - 


.49 (3) 


.53 (6) 


(0) 


.51 (9) 


Spatial 






Cotibined 


.46 (9) 


.50 (33) 


.46 (20) 


.48 (62) 



Notes: Parentheses contain number of study-median correlations used to 
canpute the mean. 

Two-way ANCVA (ability by cutcone) yielded no signigicant main effects 
or interacticns. 
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Means, Standard Deviations, and t-test 



Comparisons of Subgroups of Studies 



Study Variable 



Subcjroups Compar ed 



(1 vs 2) 



Subgroup 1 



Subgroup 2 



"Mean r 



SD 



Mean r 



SD 



Sample Size 
Grade Level ; 
Subject Matter 

Group Ability 



Experimental Interven- 
'tion Between Meas. 

Reliability of Ability 
' Measure 

Reliability of Outcome 
Measure 

Source of Ability 
Measure 

Source of Outcome 
Measure 



n< 200 vs n > 200 31 



5-9 vs 10-12 



30 



Physical Science 22 
vs Life and Earth Sci 



iligh and Above 25 
Average vs Average 

yes vs no 30 



R < .90 vs R i .90 25 



R < .80 vs R > .80 27 



Local vs Published 11 



Local vs Published 29 



.46 
.47 
.47 

.49 

.45 

.45 

.42 

.44 

.45 



.16 31 



.14 32 



.12 24 



.16 31 



.13 32 



.14 



.15 



20 



.13. 13 



48 



,08 33 



.50 



.49 



.50 



.47 



.51 



.53 



.55 



.50 



.51 



.14 1.10 .28 



.17 .47 



.17 .71 



.16^ .24 



.17 1.48 



.15 1.68 



.14 2.83 



.15 '1.70 



.16 1.48 



.54 



.81 



.14 



.10 



.01 



.10 



.14 



Time Between Measures 



Time < 4 wk vs 
Time > 4 wk 



20 



.48 



.14 



40 



.48 



.16 .02 



.99 



Note: Dependent variable is the median ability - cognitive outcome correlation per study for each cell in Table 2. 
a 

Physical science is physics, chemistry or physical science; life science is biology or life science. 
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Table 4 

Cotrrelations of Comtinuous Stucty Variables with 
Ability-Ccgnitive Outcane, Stu*^Median Carrelaticns 



study Variable 


n 


r 


P 


Sanple Size 


62 


-.01 


.48 


Grade Level 


62 


.07 


.29 


Reliability of 
Ability Measure 


45 


.12 


.21 


Reliability of 
Outccme Measure 


39 


.33 


.02 


Time Between 
Measures 


62 


.01 


.48 
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Age, Development and Science Learning 
'Abstract 

Over the past decade, developmental theory has occupied a .central role 
in science education instructional theory and empirical research. The pur- 
pose of the present study is to qiiantitatively synthesize studies relating 
age (or grade) and developmental level to science learning among grade 6-12 
students over the 1967-1978 period. Twenty-seven studies were reviewed. An- 
nual increments observed in measures of development;al level were consistent 
with ciirrent theory, and annual increments in cognitive achievement were re- 
latively constant over the grade 4-9 interval. Measures of student ability 
were found to be better predictors of cognitive achievement than developmen- 
tal measures; and age and grade level were weakly related to developmental 
level and cognitive achievement, only showing significant correlations across 
grade levels. 
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Age and Developmental Level as 
Antecedents of Science Learning. 
Over the past decade, developmental theory has occupied a central role 
In science education instructional theory and empirical research. Each annual 
Summary of Research in Science Education , e.g., Petersen and Carlson (1979), 
over the 1973-1977 period devoted a separate section to this area of research 
and focused almost exclusively on Piagetian based studies. Chiappetta (1976) 
and Levine and Linn tl977) conducted multi-year, qualitative reviews of Pia- 
getian-related science education literature. These included descriptive studies 
on the general developmental level of various components of the population and 
on the relationship of training studies to development and achievement. Other 
than the occasional count of studies reporting a certain kind of result, and 
the listing of percentages of persons at various developmental stages, no at- 
tempt has been made to provide a quantitative synthesis of the findings of re- 
lated studies. 

A quantitative synthesis of studies has the advantages of a more objective 
process for summarizing each study and a more concise means of displaying and 
interpreting trends than qualitative approach. The objectivity cirises from the 
use of a numerical coding scheme that provides for ease of replication and tests 
of agreement among raters. The quantitative summary of the studies allows tables 
and graphs for concise presentations as well as the use of descriptive statistics. 

Another value of quantitative synthesis is the comparability of findings 
relating different independent or predictor variables to a common dependent variable. 
For example, the question of whether measures of ability or of developmental level 
are, in general, better predictors of science achievement could be addressed 
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through quantitative synthesis. Additional comparisons might be made with 
other major influences on science achievement. 

Walberg (in press) has identified eight constructs in the general educa- 
tion literature as sxibstantially related to learning. The eight constructs 
are: student ability, mouivation, age or developmental level; quality and 
quantity of instruction; and home, peer, and classroom environments. The 
relationship of the constructs to student learning in science is the theme 
of other quantitative syntheses concurrent to the present study. This com- 
prehensive view of influences on science learning based on general education 
literature is in harmony with the recommendations of the NARST - NIE Commission 
on Research Priorities in Science Education (Yager, 1978). A general report 
of findings in all construct areas is in preparation (Vfelberg, Boulanger, 
Kremer & Haertel, in preparation) , 

Purpose and Scope 

The purpose of the present study is to cruantitatively synthesize studisi: re- 
lating age (or grade) and developmental level to science learning cimong grade 
6-12 students over the 1963-1978 period. The grade levels 6-12 were chosen to 
focus on that interval in the school curriculum that typically begins with the 
first required junior-high school science courses and concludes with elective ^ 
senior-high school courses. This interval is also characterized in Piagetian 
theory as the period of transition from concrete to formal thinking. The science 
education literature of the 1963-1978 time period reveals the emergence of the 
developmental perspective in educational psychology and the most recent growth 
period in the quantity and quality of science education research. 
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Methodology 

The assumptions and procedures advocated by Glass (1978) were adopted 
for this synthesis. Glass argues that all studies have flaws and limitations 
and that conbined results give better estimates of outcomes and trends than 
any single flawed study. A weakness in one study is often balanced by a 
strength in another; an effect or relationship persisting across diverse studies 
on a variety of populations is more robust than any single result. 

In the present synthesis, the individual study results of interest were 
either correlations or effect sizes. Zero-order correlations between similar 
predictor variables and similar outcome variables were recorded as data points 
for analysis. Where two age levels or, more often/ two grade levels were com- 
pared, an effect size (ES) was calculated using one of two formulas: 



ES = ES = t 



1 + 1 



and are the dependent variable means of the higher and the lower grades 

respectively. S„ is the standard deviation of the higher grade scores, t is 
H 

the computed t-t,est statistic and the n*s are group sizes. An F-ratio compar- 
ing two groups was considered equal to t^ and, ^/mS^ was considered equal to 
S F~ ratios based on comparisons of more than two groups were not used in 
computing effect sizes. 

Literature Search and Selection 



The goal of the literature search and selection was to identify two kinds 
of grade 6 through 12 studies in the 1963 through 1978 science education lit- 
erature: 1) studies that reported a correlation of age, grade, or developmental 
level with some measure of science learning, and 2) studies that reported 

J 
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measures of developmental level or science learning at two different grade 
levels in a manner that allowed. computation of an effect size. The search 
had three components: scanning of all available ERIC annual reviews of science 
education research for tne period; a study-by- study search of all 1963-1978 
volumes of the Journal of Research in Science Teaching and Science Education ; 
^d a computer search of Dissertation Abstracts and Social Sciences Citation 
index for the period in question. 

Since the literature search reyealed only Piagetian-based studies, the 
definition of developmental level was limited to any measure of iPiagetian stage 
or related logical operations whether obtained via interview techniques 
(e.g., Lawson & Blake, 1976) or other measure validated agciinst Piagetian 
theory (e.g., Raven & Polaski, 1974). Among the studies meeting the . selection 
criteria, developmmtal level appeared as. a predictor variable for cognitive 
achievement, as a criterion variable for age or grade predictors', and as a . 
dependent variable in grade level comparisons. Cognitive achievement was de- 
fined as any measure of factual and/or conceptual learning of science content, 
while science process learning was restricted to scores on the Science Process 
Inventory (Welch & Pella, 1968). The above definitions of developmental level, 
cognitive achievement, and science process learning evolved with the selection 
and coding of studies. A total of 27 studies met the selection criteria. 

Coding 

All assembled studies were numerically coded according to the following 
study variables: the type, source and reliability of independent and depen- 
dent measures; grade level, ability level, and science subject area of the sample; 
the ethnic, urban-rural, and SES character of the community; the design of the 
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study, \init of analysis^ and methodological flaws; and reported correlations or 
computed effect sizes. In total, over forty study variables were recorded for 
each study on prepared code sheets. An independent check by a second researcher 
of the reliablility of coding routinely revealed about 90 percent agreer int. 
Codebook, codesheets, and raw dataware, available in the project final report 
(Walberg, Boulanger, Kremer & Haertel, in preparation), 

Analysis and Results 
Study code sheets were sorted in terms of similarity of independent and 
dependent variables and type of summary statistic, i.e., correlation or effect 
size. The resulting five classifications and associated summary tables are: 
correlations of developmental level with cognitive achievement — seven studies 
in Table 1; correlations of age or grade with developmental level or cognitive 
achievCTient — six studies in Table 2; grade level comparisons (effect sizes) in 
terms of developmental level, cognitive achievment, and science processes — 15 
studies all in Table 3. If a study reported more than one effect size for a 
given dependent variable category and grade level cbmpcirison, the median effect 
size was identified for later analysis. Likewise, if a study reported more than 
one correlation in a given predictor and criterion category, the median correla- 
tion was selected for analysis. Median values were used to insure independence, 

\ 

since multiple effect sizes or correlations from the same study population would 
be related. An annotated bibliography of studies by category is provided in the 
Appendix, \ 

Insert Tables 1, 2 & 3 about here \ 

\ 

Mean correlations of developmental level with cognitive achievenent (fTable'^l) 
rise from .28 in grade seven to .63 in grade 9, and decline to ,32 in graddi 12, 
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The grand mean is .40 with a standard deviation of .11, Cnly one study 
(Sayre & Ball, 1975) reported correlations at each grade, 7-12, based on 
the same measures which were Piagetian interviews .(developmental level) 
and student grades in science Ccognitive achievement) . Figure 1 is a plot 
of the Table 1 mean values against grade level and the Sayre and Ball data 
against grade level. The plot .indicates that the trend in the Sayre and 
Ball. data is maintained by the other studies in grades 10 through 12, 



Insert Figure 1 about here 
The grade 7 through 9 correlations are based on data from required 
junior high courses; while grade 10 and 11 mean correlations are from, three ^ 
biology course^related and three chemistry course-related situations, res- \^ 
pectively. The grade 12 data is from one ph^'sics based study and from a group 
of British fifth and sixth form students. 

It might be . hypothesized, from a developmental perspective, that the in- 
creasing correlation over grade 7 through 9 required courses is due to differing 
developmental rates causing an increase in variation within classes as they r^^ 

r 

move from seventh to ninth grade. The decline in correlations from grade 10 \ 
through 12 most likely is due to the self-selection of students in the elective 
advanced science courses, diminishing the variation within classes by removal 
of the cognitively less-developed, and lower-achieving students. However, 
both explanations aire vulnerable to competing interpretations, such as changes 
in interest and motivational factors which influence performance on both develop- 
mental and cognitive achievement measures. },-^ 

The correlations in 'Table 2 of age or grade with developmental level 
range from .00 when based on the ages of grade 11 science students, to . 57 
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when data spanning six grade levels (grade 4 through 9) is included- These 
correlations emphasize the inappropr lateness of strongly associating age or 
grade with levels of intellectual developnent or ability to use logical oper- 
ations. The .57 correlations would mean that only about 30 percent of the 
variation in developmental level across the developmen tally diverse series 
of grade levels is accounted for by grade or chronological age. As will be 
seen in the next section, the low correlations of grade with developmental 
level may be explained by the fact that within-class variation is greater 
than between-class variation. Table 2 also indicate's' that age or grade level 
is a poor predictor of cognitive achievement. 

When studying the calculated effect sizes in Table 3, it should be noted 
that a mean effect size comparing one grade level to the next is simply the. 
difference in means between the lower and higher grade converted into standard 
deviation units. The distribution of the higher grade's score is ass\imed to be 
normal and the lower grade's mean is to the left of the higher grade's central 
mean on the normal curve by the amount of the effect size. 

The grade comparison mean effect sizes presented in Table 3 are best visual- 
ized by plotting the cumulative mean effect. size against grade le^el. The incre- 
mental effect size to be added each year is based on the average of the mean ef- : 
feet sizes which apply to the grade interval. in question. For example, examina- 
tion of the first entries in the far right column in Table 3 in conjunction with 
the far left column will indicate that .261 and .399 yearly increments both apply 
to the grade 5 to 6 interval and thus should be averaged when plotting the grade 
5 to 6 increment. Following this method of calculation, Figure 2 parts a, b, 
and c displays three plots of the cumulative effect sizes of developmental lev- 
el, cognitive achievement, and science process learning respectively over grade 
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levels in the data being plotted. 

Insert Figure 2 about here 

While inspecting Figure 2, certain quantities and trends should be 
noted. Based on six and ten studies respectively , both Figure 2a and 
Figure 2b show fairly smooth curves with gradually increasing incre- 
ments in the case of developmental level and relatively constant incre- 
ments in the case of cognitive achievement. The increasing developmental 
increments are in correspondance with developmental theory which poses 
a concrete operational to formal operational transition beginning about 
grade six for many children. Even if individual transitions were fairly 
sharp for most children , group data would show only a gradual upward swing 
of the mean accompanied by the increased variation noted earlier in the cor- 
relational results. The linearity of the cognitive achievement cumulative 
effect size over the same grade, intervals suggests that the developmental 
upward swing is not simply an artifact of increasing achievement. 

A second trend worth noting is the relationship between within-class 
and between-class variation. Developmental effect size increments sum to 
.932 between grades 4 and 7 . This means that the average seventh grade 
student is approximately one standard deviation above the average fourth 
grade student on developmental level measures. Thus r the upper 16 percent 
of the fourth grade is developmentaliy above the median level of the seventh 
grade. The between class variation is small compared to the within- " 

class variation. Similar statements can be made about cognitive achieve- 
ment; e.g., a change of nearly four grade level mean values is analogous 
to a change of one standard deviation (one effect size unit) of within 
class variation. 
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The cumulative effect size of science process learning against 
grade (Figure 2c) is more irregular than the other two plots. The 
Icirge gain (.754) in the grade 9 to 10 interval is based on one study 
(Tamir/ 1972) where kno,wledge of science processes was measured at the 
end of each grade level year. The gain is largely a measure of the ef- 
fects of tenth grade science, the character of which is unclear from the 
study report. The irregularity of the plot in general may be an artifact 
of combining results of only two studies (Tamir, 1972, & Welch and Pella, 
1968) conducted in quite different educational systems (Israel and Wisconsin, 
respectively) . 

The mean annual effect size increments for the three Figure 2 plots 
are: developmental level, .36? cognitive achievement, .28; and science 
process learning, .43. Expressed as percentiles, the increments indicate 
the approximate advance of the mean class score each year from the previous 
year's 50 percentile point. Average yearly percentile increases would be: 
developmental level, 14? cognitive achievement, 11? and science process 
learning, 17. \ 

The analysis to this point has focused only on correlations and effect 
sizes and their relationship to grade levels. Additional information about 
each study was coded to provide normative values and to determine if study 
variables such as instrument reliability, sample size, etc. had ^any across 
study^ systemtaic influence on correlations or effect sizes. 

The reliablilities of cognitive achievement, developmental level and 
science process measiares were comparable in average values (.73, .72 and 
-76 respectively) and were unrelated to either correlation or effect size 
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values. The ranges of reported reliabilities were .49 - .98, .50 - .92, 
and .74 - .79 respectively. Only 12 of the total 27 studies reported in- 
strument reliabilities. 

All developmental level measures were research measures, ie. , not stan- ' 
dardized over any representative local, regional or national sample. Written 
measures of logical operations, eg.. Raven and Polanski (1974) were, in gen- 
eral, more reliable than task based measures, eg., Lawson and Blake (1976). 
Four studies with written measures yielded a mean reliability of .79, while 
five studies with task measures yielded .67. Assessing the validity of either 
kind of measure is difficult since both deviate from the Piagetian clinical 
approach and aire analyzed in terms of parametric statistics; yet, the content 
of both kinds of measures is founded in ^ Piagetian theory. 

Among correlational studies relating developmental level to cognitive 
achievement, average or heterogenious groups registered higher correlationis 
(eight correlations with mean of .45) than high ability groups (four cor- 
relations with mean of .31). This trend is related to the self -selection 
in higher grade levels referred to earlier. The high ability groups are all 
in elective eleventh and twelth grade courses. 

Several study' variables , eg., population demographics, were too infre- 
quently reported for analysis. Sample size was reported for all studies but 
bore no relationship to correlation or effect size values. 

Threats to the" validity of study designs were primarily of two kinds: 
convenience sampling which threatened generalizability , and use of cross-sec- 
tional data in grade level comparisons. No longitudinal study was found 
which traced the development of a group of students over a period of time 
(other than pre and post measures bracketting an instructional treatment) . 
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Developmentaliy related instructional studies are reported in Boulanger, 
1979b. 

The usual caution in the interpretation of all tables, plots, and 
quantitative values presented above is appropriate here. All figures 
and interpretations are based on a relatively small number of diverse 
studies. . The case for this kind of quantitative synthesis rests on the 
argument that the combined results carry more general validity than any 
single study, as well as showing trends not apparent in studies considered 
singly. All the above interpretations should be considered hypotheses for 
further investigation; all average correlations and effect sizes should be 
considered as only tentative norms based upon data available in the 1963- 
1978 period. ' 

Discussion 

The grand mean correlation of .40 between developmental level and cog- 
nitive achievement might be compared to the correlation between ability 
measures and cognitive achievement reported in another research synthesis 
(Boulanger, 1979 a). Ability was defined as any measure of prior achieve- 
ment, general ability or quant itative-spacial ability.. The mean correlation 
between ability and cognitive achievement was ,48 with a standard deviation 
of .15,. significantly (p<.01) higher than the developmental-level-as-predictor 
correlation reported in this study. Since general ability or prior achieve- 
ment measures are usally available in school records, the value of administer- 
ing time-consuming developmental measures for achievement prediction, in gener- 
al, makes little sense unless it can be shown that developmental measures ac- 
count for significant amounts of unique variance not accounted for by ability 
measures. However, the more common defense for the use of developmental measures 
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is the diagnostic value of Jcnowing student capabilities in the various 
kind of theory related logical operations. Abil^^ measures may tap many 
of the same skillS/ but developmental measures make logical operations and 
student weciJcn esses in applying them more explicit on an .individual basis. 

Another, research synthesis (Boulanger, 1979 b) which examined the 
effects of training in scientific thinking skills has implications for the 
findings of this study. In the present study, the annual mean percentile 
gain in developmental level was found to be approximately 14 percentile points 
in the grade 4-9 interval with the annual increment increasing in the higher 
grades (Figure, 2a) . Based on 11 training in scientific thinking studies, 9 
of these training in Piagetian logical operations, a mean effect size of .89 
or 30. percentile points was found when trained groups were compared with un- 
trained control groups. These training effects occurred in grades 5 through 
9 primarily as a result of short term (two to ten hours) tutorial type train- 
ing of individual students by special teachers. Long term effects of the 
training were not investigated in the studies; but the studies strongly sug- 
gest that the annual increments in such developmentally related traits as 
logical reasoning patterns can be increased with appropriate instruction. 

Summary 

Twenty-seven studies were identified in the 1963-1978 science education 
research on students in grades 6 through 12. The studies related age or grade 
d.evelopmental level- and science leairning in terms of either correlations or 
computed effect sizes. Major findings were: 

a) The mean within] grade level correlation of developmental level with cog- 

j 

nitive achievement is .40, with individual grade l§vel correlations reach- 
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ing a maximum in grade nine. 

b) Ages and grade level are weakly related to developmental level and 
cognitive achievement, only showing significant correlations when 
computed across several grade levels. 

c) AnnueLL increments in dev.elopmental level effect size average .36 
(14 percentile paints) and increase over the grade 4-9 interval in 
agreement with developmental theory. Training studies reported else- 
where indicate that it may be possible to increase these increments 
through carefully designed instruction. 

d) Annual increments in cognitive achievement are relatively constant at 
an average of .25 (10 percentile points) over the grade 4-9 interval. 

e) Ability measures are better predictors of cognitive achievement than 
are developmental measures. 

Recommendations 

Piagetian based developmental measures are founded in hypothesized 
intellectual structures and operations which emerge in stages over the 
years of childhood and adolescense. Traditional ability measures are 
norm referenced and are founded in observed reasoning skills often in the 
context of culturally defined situations. Both kinds of measures cor- 
relate with culturally defined cognitive achievCTient. To sort out the 
unique contribution of each kind of measure to the prediction of science 
learning, both should be administered and later* related to both cognitive 
achievement (as defined in this study) and developmental growth. A long- 
itudinal series of such measures over a period of years would allow the 
tracking of both individual and group absolute progress in intellectual 
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development and relative standing on ability and achievement measures. 
This would allow verification of correlational and effect size trends 
described earlier. Planned^ developmentally oriented instructional in- 

' terventions with selected subsamples would provide time series data to 
be collected on the short cind long term value of such interventions on 
both development sind achievement. 

The weak point in the above plan 'is the present set of developmental 
measures.' A first research priority is the creation of a series of valid 
and reliable developmental measures which provide qucintitative indicators 
of developmental level comparable over the full range of developmental 
stages. The meas^ores should account for significant unique variance in 
science learning when compared to ability measures in order to justify the 
time and expense of administration. The measures should also possess diag- 
nostic, properties to provide direction to the developmental aspects of sub- 
sequent instruction. 

Chronological age, and school grade remain rough indicators of develop- 
mental level and science learning and will continue to be routinely recorded 
for a variety of organizational and cultural reasons. Age is probably better 
related to physical maturity, general life experience, and broad psychosocial 
life stages than to intellectual development, and, even less so, to science 
learning. 
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Table 1 

Mean Correlation by Grade of Developmental 
Level with Cognitive AchievOTent 

^ Number of ^ ^ Mean 

Grade Median/Correlations C or r e 1 a t ion 



7 


1 


.28 


8 


1 


. 31 


9 


1 


.63 


10 


3 


.47 


11 


3 


.36 


12 


2 


.32 



Grand Mean . 40 



One study (Leon, 1975) reported a correlation of .48 
based on combined grade 7-9 data. 

b . 

The total number of studies represented in this table 
is seven. One study (Sayre S Ball, 1975) reported six 
correlations, one at each grade level. 
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Table 2 

Correlation of Age or Grade with 
Developmental Level or Cognitive Achievement 

Grades 

Included Number of or Grade Correlation with 



in Data 


Correlat ions 


Devel. Level Cogn, Achieve, 


4,5,6 


1 


.01 


4,5,6,7,8,9 


1 


.57 


4,6,8,10 


1 


.39 


7 


1 . 


' -.03 


11 


1 


-.11 


11 


1 


.00 



a 

The total number of studies represented in this table is six. 



/ 
/ 
/ 



/ 



/ 

/ 



/ 
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Table 3 

Grade Comparison 
Mean Effect Size by Outcome 



Grades 
Compared 


TJumber of 

Median Comparisons 


Mean 

Effect Size 


Mean Effect 
Size Per Year 




Developmental Level 


Outcome 




4,6 


* 1 


,521 


,261 


5,7 • ' 


2 . . 


,797 


,399 


6,8 , 




,565 


,283 


7,9 


2 


.966 


,483 


Cognitive Achievement Out^^ome 


4,6 


4 


/.547 


,274 


5,7 


1 


\. ,525 


,263 


7,9 


2 


, 57 5 


-288 


9,10 


■ 2 


,142 


,142 




Science. Process 


Outcome 




9,10 


1 ' 


,754 ' 


•.,754 


10, li 


2 ^ 


,086 


,086 


11,12 


2 


, 442 


,442 



The total number of studies repr.esented per section are: 
Developmental Level, 6; Cognitive Achievement, 8; Science Process 
Outcome; 2, One study appears in two sections? total for table is 15 
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Figure 1 

Developmental Level - Cognitive Achievement 
Correlation versus Grade Level 



Correlation of 
Developmental 
Level with 
Cognitive 
Achievement 



.7 
.6 

.4 
.3 
.2 
.1 
.0 




8 9 10 

Grade Leve 1 



11 



12 



Note . Solid line connects Table 1 mean correlations. Dashed 
line connects data points from Sayre and Ball (1975) 
who reported a correletion for each grade. 
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Cumulative Effect Size Based on 
Annual Grade Interval Effect Sizes 



Developmental 
Level 

Cumulative 
Effect Size 



Figure 2a 1.8 
1.6 
1.4 
1.2 
1.0 
.n 
.6 
.4 
.2 
.0 



Figure 2b 



Cognitive 
Achievement 
Cumulative 
Effect Size 



Figure 2c 



Science 
Process 
Cumulative 
Effect Size 



1.2 
1.0 
.8 
.6 
.4 
.2 
.0 



.48(1) 



.38(2) 



.34(2) 



.33(2) 



.26(1) 



6 7 
Grade Level 




14(1) 



5.6 7 8 
Grade Level 



.44(2) 



.09(2) 



.75(1) 



9 10 11 12 
Grade Level 



Note. Number to the right of each line segment is the mean 
effect size increment with contributing number of 
values in parentheses. 
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Appendix 

Annotated Bibliography of Studies on 
Development and Science Learning 
Effect Si ze Studies (Age/grade and developmental level) 

Hammond, J. & Raven y R, The effects of a structured learning 

sequence on the achievement of compensatory tasks. Journ al 
of Res ear ch in Science Teaching , 1973/ 10^, 257-26-2. 

55, grade 6-8 students grouped into three ability levels 
(within grade), randomly assigned to control and programmed 
instructional groups in compensatory operations, Experi- 

, mental instructional groups scored higher on a post-test 

than did control groups. 

Lawson/ A.E. & Blake, A.J.D. Concrete and formal thinking abilities 
in high school students as measured by three separate in- 
struments. Journal of Research in Science Teaching , 1976, 
13 , 227-235. 

32 biology students were administered tasks measuring 
Piagetian Stage, and a test of understanding of concrete 
and formal biology concepts. Performance on concepts tests 
varied significantly as a function of stage. 
Lewis / W. R. The influence of age, sex, and school size upon the 
development of formal operational thought. Unpublished 
doctoral dissertations. University of Oki*ahoma, 1972 . 
574 junior and senior high school students were individually 
administered six Piagetian tasks. Significant differpnces 
were observed between grades separated by two or more years, 
but no significant differences In groups separated by one grade 
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Nous, A. & Raven, R. The effects of a structured learning sequence 
on children's correlative thinking about biological phenomena 
Journal of Re search in Science Teaching , 1973, 10 , 251-255. 
246, grade 5, 7, & 9 students receiving identical instruc- 
tion on correlat ional thinking. Performance on a post -test 
varied significantly with grade. 

Raven, R. & Polanski, H. Relationships among Piaget's logical 

operations, science content comprehension, critical think- 
ing, and . creativity . Science Education , 1974, 5^, 531-544. 
Performance of 220 grade 4 & 6 students were compared on ' 
tests of general science achievement and critical thinking. 
A significant difference between grade levels (favoring 
grade 6) was observed. 

Raven, R- J. & Calvey, S. H. Achievement on a test of Piaget's 
operative comprehension as a function of process - oriented 
elementary school science programs. Science Education , 
1977, 6jL, 159-166 - 

Performance of 249 grade 6 & 8 students on a test of logical 
operations was compared. A significant difference betv/een^ 
grade levels (favoring grade 6) was observed. 
Ef feet-Size S t udi es ( Age /grade an d science process achievement ) 
Tamir, P. Understanding the process of science by students 

exposed to different science curricula in Israel. Jour na 1 
of Rese arch in Science Teaching , 1972, 9_, 2 39-244. 
3500 Israeli grade 9 - 12 students were administered the 
welch Science Process Inventory. Norms for Israeli students 
were established- 
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Welch, W. W. & Pella, M. 0. The development of an instrument 
for inventorying knowledge of the processes of science. 
Journ al of Res earc h in Science Teaching , 1968, 5_, 64-68. 
839, grade 10 - 12 students were administered a test of 
science processes (SPI) . No significant differences be- 
tween grade levels were observed. ^ 
Effect Size S tudi es ( Age / grade and c ogn it ive achievement ) 

Doran , R. L. Misconceptions of selected science concepts held 

by elementary school students. JournaL of Research in Sc ienc e 
Teaching , 1972, 9, 127-137. 

2 53, grade 2-6 students were administered a test of science 
misconceptions. Mean test scores increased with grade. 

Kauchak, D., Eagen , D., S Kirk, S. The effect of cue specificity 
on learning from graphical materials in science. Journal of 
Res earch in Sc ienc e Teac hing , 1978, 1_5^,, 499- 50?. 
82, grade 4-6 students randomly assigned to three tredt- 
menLs: Cued questions, non-cued questions, and ganeralizing 
questions in passages about plant growth. Performance in- 
creased significantly with grade. 

Lawson, A. E. & Blake, A. J. D. Concrete and formal thinking abil- 
itites in high school students as measured by three separate 
instruments. Journal of Research in Science Teaching , 1976, 
13 , 227-23 5. 

32 biology students were administered tasks measuring Piagetian 
Stage , '--and a test of understanding of concrete and formal biol- 
ogy concepts. Performance on concepts tests varied significantly 
asdifurictionof stage. 
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Pederson, A-A. & Jacobs, J. E. The effect of grade level on 
achievement in biology.- Journal of Rese arch in ^ Science 
Teaching , 19 76, 13^, 237-240, 

Performance of 684 grade 9 & 10 biology students was compared 
on a local achievement test. Mo significant differences 
observed. 

Pella, M. O. & Triezenberg, H. J. Hiree levels of abstraction of the 

concept of equilibrium and its use as an advance organizer. Journal 
of Research in Science Teaching , 1969 , 6_, 11 - 21 » 
270, grade 7 & 9 students randomly assigned to three advance organizer 
treatment groups. A significant difference in performance between grade 
levels on a test of factual knowledge was observed. No differences 
were observed among treatment groups. r 
Raven^ R. & Polanski, H. Relationships among Piaget's logical operations, 
science content comprehension, critical thinking, and creativity. 
Science Education , 1974, 58_, 531-544. Performance of 220 grade 4 & 6 
students were compared on. tests of general science achievement and 
critical thinking. A significant difference between grade levels 
(favoring grade 6) was observed. 
Voelker, A. M. Elementary school children's attainment of the con- 
cepts of physical and chemical change--a replication. 
Journal of Research in Science Teaching , 19 7 5, 12_, 5 - 14. 
Performance of 40 grade 4-6 students on a post-test of 
concepts of physical and chemical change was compared 
(experimental and control groups with in each grade had 
previously received instruction). A significant .differ- 
ence between grade levels (favoring grade 6) was observed. 




D 29 



Walters, L. L. Ninth vs. tenth grade biology--a comparison 

of achievement. Journ al of . Research in Science Teachi ng . 
1963 , 1 , 170-17 6 . 

Performance of 144 grade 9 & 10 students on the Nelson 
Biology Test was compared. No significant differences 
were observed. 

Corre lat ional Studies ( Development a 1 level and cognitive achievement ) 
Cantu, L. L. & Herron , J. D. Concrete and formal Piagetian stages and science 
concept attainment. Journal of Research in Science , 1978, L5^ 135-143. 

Ic c'.ieriistry students identified as formal operational, 
and 12 as concrete operational were administered tests of 
concrete and formal concepts, following instruction. 
Formal operational students performed significantly better. 
Fields, T. W. & Cropley, A. J. Cognitive style and science 
achievement. J ourna 1 of Rese arch in Science Te aching , 
1969 , '6 , 2 - 10 . 

178, fifth and sixth form students were administered tests 
of Piagetian operations and science achievement. Level of 
cognitive operations were found to be significantly corre- 
lated with achievement. 
Lawson, A. E. & Nordland, G. H. Conservation reasoning ability 

and performance on BSCS blue version examinations. Journal 
of R es ear ch in Science Teaching , 1977, 14_, 69 - 75. 
23 biology students were dministered tests of Piagetian 
conservation and the BSCS achievement test. Significant 
correlation between test performance and conservation. 
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Lawson, A. E. & Blake, A. J. D. Concrete and formal thinking abilities 
in high school students as measured by three separate instruments. 
Journal of Research in Science Teaching , 1976, 13_, 227-235. 
32 biology students wern administered tasks measuring Piagetian Stage, 
and a test of understanding of concrete and formal biology concepts. 
Performance on concepts tests varied, significantly as a function of stage. 

Leon, L. O. The principle of conservation or invariance and its relationship 
to achievement in science in the junior high school. ED 091 145, 1975. 
132 grade 7 - 9 students were administered the STEP test in science and 
a test for conservation of quantity. Significant correlation observed 
between ability to conserve and achievement in science. 

Rubley, V. D. An investigation of formal thought and dogmatism 
during the transition between adolescence and adulthood. 
Unpublished doctoral dissertation. University of Iowa, 1972. 
60 high school chemistry students were administered Piagetian 
tasks and the ITED background in the natural sciences test. 
No correlation between age and test performance. 
Sayre, S. , & Ball, D. W. Piagetian cognitive development and achievement 

in science. Journal of Research in Science Teaching , 1975, L2, 165-174. 
352 junior and senior high school science students were administered 
Piagetian tasks. Significant correlation between grade in science and 

c 

tasks performance. 

Correlational Studies ( Age/grade with cognitive achievement and developmental level . ) 

Bredderman, T. Elementary school science experience and the ability to combine 
and control variables. Science Education , 1974, 5£, 457-469. 
30, grade 4,6,3 & 10 students were adnr.nistered a test on controlling and 
combining variables. .Significant correlation between .^ge and test perfor- 
mance was found. 
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Gunnels, F. G. A study of the development in logical judgements in science of 
sucessful and unsuccessful problem solvers in grades four through nine , 
ED 026 249. 

Inferences drawn by students in grades 4-9 from science texts were related 
to Piagetian levels of thought. Older students and those at higher grade 
levels were found to operate more frequently at formal levels of operation- 
al thought. 

Hardy, C. A. Chem study and traditional chemistry: an experimental analysis. 

Science Education , 1970, 5£, 273-276. Performance of 208 chemistry students 
and traditional chemistry students were compared on tests of standardized 
achievement and critical thinking. Ability and past achievement signifi- 
cantly correlated with post-test chemistry achievement. 
Nordland, F. H., Lawson, A. E. , & Kahle, J. B. A study of levels of concrete 
and fonnal reasoning ability in disadvantaged junior and senior high 
school science students. Science Education , 1974, 58_, 569-57 5. 
96 minority junior high, and 506 minority senior high science students 
were administered tests of Piagetian operations. No correlation between 
age and task performance. 
Pella, M. Ziegler, R- Use of mechanical models in teaching theoretical 

concepts. Journal of Research in Science Teaching , 1967-68, 5^, 138-150. 
72, grade 4,5 & 6 students were administered tests of 
science achievement after being instructed in concepts 
relating to particle nature of matter. No correlation be- 
tween age and test performance. 
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;\bst:ract 

Research on the relationship of social and psychological factors — 
including student motivation and heme and peer environments — to science 
learning in grades 6 through 12 vas synthesized. Twenty-six studies 
conducted over a 16 year period from 1964-1979 were considered. A 

quantitative synthesis of findings indicate that motivation, home and 
peer environments are inportant correlates of science learning, and 

results in science are parallel to those dDserved in previous syntheses 

of these constructs in general, educational research. 
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A Synthesis of Social cind Psychological 
Influences on Science Leciming 

Beginning with Jones and Fiske (1953) , a number of reviewers have 
urged the quantitative synthesis of educational and psychological research 

findings (Gage, 1978; Lights Ehiith, 1971; and Rosenthal, 1976). These 
revie.vers describe a vatiety of statistical techniques for summarizing and 
evaluating a series of eirpirical findings across investigations. As for 
exanple, in the natural sciences where estimates of astrononical con- 
stants are made (Ash, Shapiro, & Smith, 1967) , these techniques are 
intended to provide objective estimates across investigations of the 
consistency of observations or coefficients such as means, correlations, 
and regression weights; their magnitude and margins of error; cind their 
boundaries of application. 

The purpose of the present review is to synthesize, through the 
application of quantitative methods, social and psychological research 
on science learning in grades six throu^ twelve, conducted under three 
rubrics — student motivation, home or family environment, and peer-group 
environment. The present synthesis is part of a larger effort to syn- 
thesize science education research on factors that are productive of 
cognitive, affective, and behavioral learning. Those considered are 
student ability (including develqxnental level and prior achievement) 
and motivation;^ amount and quality of instruction; and heme, classrocm, 
and peer-group environments (Walberg, 1978). These factors have been 
frequently investigated on general educational research, and shav rea- 
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sonably consistent^ and, in most cases moderate to strong associations 

■ * ' ' 

with leciming outcomes. 

•J 

It seans, particularly appropriate to (investigate learning pro- 
ductivity in science^education at this time for several reasons. The 
movement "back to basics," and tightened school budgets threaten to 
diminish the place of .science in the school curriculum as represented 
by instructional time, quality;^ of lessor|/ preparation, and laboratory 
facilities. Furthermore, the graving field of science education 



research has yielded a large number of /published reports that appear 



ready for parsimonious integration an<^' summary. Syntheses of educa- 
tional research in siJDjects such as reading and mathematics, focus- 
ing on a large number of constructs and subconstructs, have already 
been conducted (W^lberg, Schiller, J ilacrl:ci, 1979; Uguroylu & 
Walberg, 1979; Iverson & Walberg, 1079). It is of interest* to- 
knew if the results of synthesis c^r:ried out in science yield the 
same general conclusions, or whet/ier a separate set of learning 
laws or "production functions" s4en necessary in the spec'al field 
of science. The identif icatiory of causal factors or constructs, 
and the inaportance of objectively reviewing evidence on them, is in 
substantial agreement with tJ^ broad review of science-education 
research needs carried out b^ the National Association for Research. 



on Science Teachijig and th9 National Institute of Education (Yager, 
1978) . / 

The constructs of motivation cind heme and peer-group environ- 
ment are placed togethe^ in the present' synthesis, and somewhat 

/ 

/ 
/ 

/ 
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apart from the others, because these topics have hy comparison, been 

neglected in science as well as in general educatiaial research. It 
is therefore possible to bring together, and discuss all the selected 

work on these three constructs in a single paper. Second, these three . 
constructs, fit under the general rubric of social psychology rather 
than the mainstream fields of curriculum, instruction, or cognitive arid 
behavioral psychology tliat currently seem more influential on edu- 
cational policy and practive. Wbrk on the social environir.ant of the 
classroom is also social -psychological, but the sizeable number of 
large-scale studies necessitates a separate treatment (Haertel, Walberg,- 

i 

& Haertel/ 1979). lastly, motivation and home and peer-group 
environments, are only sani-manipulable and undeij' the partial control 
of educators. They seem less fixed than mental ability but, on the 
other hand, more difficult to influence thin teacher behavior or allocation 
of time in the curriculum. Ihe science teacher can raise motivation, 

tnd perhaps also encourage science learning in the home and in ado- 
esc^iJ..-'P5ei?--g^oups; but such changes require the cocperation of other 
agents such as the students themselves and their families. For these 
reasons, the. three major constructs are synthesized and compared in 
the present review. 

Literature Search cind Selection 

Fifteen years of science education literature (1964-1979 > were 
searched to identify studies relating science adAevement and learning 
to each of the three constructs areas under consideration: Student 
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motivation, home or family environment, and peer environment. This time 
period was selected in order to reflect recent growth in curriculum 
development and evaluation and to include the most current science edu- 
cation research. In searching the literature, priority was given to 
selecting studies frcm refereed journals, S^circh procedures were exten- 
ded to unpublished reports and dissertations when the number of studies 
located in the published journals did not appear to be sufficient. 

For the period 1964 to 1979, studies in the two major research 
joumcils in science education, the Journal of Research in Science Tea- 
ching and Science Education , were scanned. Volumes of School Science 
and Mathematics , Journal of Education cil Psychology , Developmental 
Psychology , and Sociology of Education for the years 1971-1977 were 
also searched. CanpuLcr secirciies of studies indexed i^y the Educa- 
tion^ Reaaorces Information Center (ERIC), and the Social Sciences 
Citation Index (SSCI) were conducted. The collection of science edu- 
cation bibliographies and annual reviews published by the Science, 
Mathematics, and Environmental Education information Analysis Center 
(ERIC/a^EAC) were scanned for citations of dissertations, and unpu- 
blished reports. 

Studies were screened <. ^d selected for synthesis on thie basis of 
the following criteria: 1) Con::emed with science learning in grades 
6-12; 2) I'hat some measure of student learning in science (e.g. , 
achievement, attitude, developmental level attained) be reported; 

3) That at least one of the three constructs under consideration 
serve as a preuictor of science learning. Table 1 presents defi- 
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nitons of the motivation, hone and- peer constructs v^iich 
guided this search, and examples of how these constructs were concep- 
tualized and operationalized in the literature. 

Insert Table 1 about here 

The results of this search and selection yielded a total of 20 
studies: 5 studies considering student motivation, 13 of hone envi- 
ronment, and 5 of peer environment, (two of the studies selected con- 
sidered 2 or more of these construct variables) . While nxjmercus 
studies of student motivation", heme environment, and peer environment 
constructs' were found, many were excluded from- the analysis for seve- 
ral reasons: measures of science achievonent were either absent or 
invalid (22 studies); findings relating the effect of the construct 
variable. to achievement^ were inadequately reported (15 studies); 
reports were based on opinions rather than evidence ( 5 studies) ; or 
Studies considered the effects of science learning on some measure 
of the construct variable *such as students' self-concept in science, 
or ccnpatibility with peers (15 studies) . Up to 20 studies of home 
environment alone, were excluded for these reasons. A cotplete 
bibliography of studies selected for inclusion under each construct 
is contained in the appendix. 

Method of Malysis 

All of the studies selected for synthesis were numerically 
coded using schanes developed by investigators for each construct 
area. From 40-50 study Vciriables were coded in each construct area. 
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These included tiie type, source, and validity of science learning and 
construct measures; the characteristics of the sanple; the type of 
design anployed; and methodological flaws threatening the validity of 
the study. Statistical information, including correlations and infe- 
rential statistics, levels of significance, and the sign or direciton 
of results were also recorded for each study. Copies of the coding 
schemes used are available from the authors. 

The limited number of adequate studies available under these 
constructs, precluded the use of multivariate techniques of research 
synthesis (Glass, 1978) . 

Instead, findings were synthesized by plotting the correlations, 

calculating si-mple statistcs, and tabulating "box scores" denoting 
the -direction (vstiether positive or negative) of the relationship 
between construct variables and learning outcones. 

Results and Discussion 

The majority of studies selected for this synthesis, a total of 
16 of 20, were correlational. Where correlations were reported in 
studies, these were recorded for analysis. In studies not reporting 
correlations between construct variables and learning outcomes, when 
possible, tecliniques outlined by Glass (197R) for converting statis- 
tics to correlations were applied. In studies with insufficient 
information to derive correlations from statistics reported, signs 
or box scores were coded denoting the direction of the relationship 
between construct variables and science learning outcomes. Studies 
indicating that as the construct variable increased, "science learn- 
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ing or achievement increased, were coded as positive ("+"). Stijdies 
showing an inverse relationship or no relationship, betweai construct 
and achievonent variables were coded as negative ("-") . 

Subject characteristics, study features and findings, median car- 
relations, and bcK scares of studies under each of the three construct 
areas, are sunmarized and discussed below (see Tables 2, 3 and 4) . Un- 
less specified in the table, subjects were fran white, ndddle-class, mixed 
sex populations in the IMited States. While the sample of studies repre- 
sented is limited, the results indicate consistent, positive findings, 
in studies considering student motivation, hone envijxnment, and peer 
eivircnment as predictors of science learning. QE the total 20 studies 
considered, 14 indicated positive signs of the findings. The bincmial 
probability of this ratio is< .01. 

Table 5 presents ston and leaf diagrams (Tukey, 1976) of all correla- 
tions in all stiodies as well as the median correlations for each study, Tlie 
f'irst decimal place of the correlation is represented on the ston on the left 
of the vertical line; and the second decinnal place is represented as a leaf 
to the right of the line; for exanple, the highest and lo'/est outlying 
correlations for the strodent motivation construct are .15 and ,58. 
These diagrams show all the correlations in the studies as well the 
study-median correlations that weight each study eoually. Mean correla- 
tions were conputed for each construct area using the raw correlaticns 
reported in individual stuuies. The me-=in correlations for the three con- 
struct areas are .37 for student motivation, .30 for hone envixcninent, and .24 
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Results pacific to each construct area are 



Insert Table 5 about here 

Student Motivation 

All of the studies of student motiv'^tion and science achievement 
located, showed positive relationships between motivational variables 
and learning. These are siimiarized in Table 2. Three studies con-/ 
sidered measures of acadonic self-concept (M^^ord & Gla^s, 1974 ; 
Raven & Adrian, 1978; riancini, 1972)^ one study (Bart^ 1978) looked at 
reported persistance, and another (Soh, 1973) considered general, need- 
achievement motivaticn. Of these studies relatdxig st.udent self^ccncept 
to science learning, only crie sbudy (Ravon &• Adrian, 1973), spe::'.ii'^rtlly 
looked at 5?tudc3nts' concept of their ability in science, as opposed to 
general academic self-concept. 

Insert Table 2 about here 

The mecin correlation for student motivation and science lear- 
ning, .37, is scmev^at higher than those obtained in the hane and 
peer environment constructs as shown in Table 5. In part, this 
may be explained by the fact that standardized scales having the 
advantage of higher measurement reliability, were used to measure 
motivation sub-constructs (e.g., self-ooncept) . This of course, 
was not the case in studies of huae cind peer environment, as will 
be discussed belcw^ As noted in a synthesis of student ability 



for peer eivironment. 
discussed helcM. 
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and. science leciming (Doulanger, 1979): construct measures v;ith hiyher 
reliability yield higher correlations witli learning outcanes (particu- 
larly cognitive ones) than measures with lower reliabilities. 

Individual correlations reported for the student motivation construct 
area in Table 5, indicate a median correlation of .33. Previous studies 
of student motivatioi and general educational achievanent conducted by 
Bloom (1976) and Uguroglu and Walberg (1979) report median correlations 
of .35, and .30 respectively. These studies were based on large naticnal 
samples, and included correlations with achievement data fron reading and 
mathonatics . The similarity of the correlations found in this study with 
those reported by Blocm / and Uguroglu and Walberg suggests that the "pro-v 
ductive functicn'* of student motivation in learning and achievement is 
independent of subject area or content. This possibility warrants further 
study. Motivational factors in scioice learning, in general, merit greater 
attention than they have received from science educators as evidenced by 
these findings. 
Home Environment 

All of the studies selected in this construct area, as s.ummarized 
in Table 3, contained measures of parents' socio-economic status or SES, and 
science leciming. Among the SES indices considered, were parent occ- 
upation, ^ parent education, and camiunity SES. Of, tPt^ 13 studies con- 
sidered, 9 show positive relationships between pareri^l SES and 
science learning: Students of higher socio-economic status hones 
scored higher on achievanent measures of logical c^erations (Bart,, 
1978), science attitudes and interests (Neujahr & .iansm 1970; 
Hastr*^ / 197^; James ^ Pafford, 1973; Keeves, 1975), general cog- 
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nitive leamirg in science .(Hardy, 1970; Keeves, 1975; Klein, 1971; . 
Troost, 1969), critical thinking (Hardy, 1970) , and factual learning 
(Lynch _et^., 1979), Stixiies showing no significant relaticnship 
between SES and science achievement are those considering process 
learning (Oiinn &. George, 1975), factual learning (Ashbaugh,. 1968), 
and science attitudes and interests (Wynn & Bledsoe, 1967). 

Insert Table 3 about here 

The mean value of correlaticns reported betvjeen SES and scimce 
learning was ccnpited as ,35. j * ' T^te 

(1976) obtained a mean correlation of .26 between parental social class 
indices and measures of verbal and mathematics achieve^nent . Again, as in 
the case of student motivation, the correlation cbtained in science is 
sind.lar to that detained in earlier work on general educational achieve- 
ment, based ch a larger sarrple of studies. 

In addition to SES, several studies considered other indices of 
heme environment. Mong these sub-constructs were parent education 
(Hasan, 1975) , parental a^irations for student achievenent (Bart, 
1978; Hasan, 1975; Keeves, 1975), parent involvement in the student's 
education (Bart, 1978), and the presence of science equipment in the 
heme (Neujahr & Hansen, 1970). The mr;an of correlations reported for 
these indices was crnouted as .36. • Higher correlations with learning 
were therefore obtained for these indices than for more 'general SES 
measures. P^ain, this correlation is similar to that reported else- ^ 
where for verbal and imthematics achievement. Iverson and Vfelberq (1978) 
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obtadned a mean correlation of .35 for studies considering parent stimu- 
lation of the child with measures of .verbal ability and general educa- 
tional achievement. 

A particularly noteworthy study in this construct area is that of 
Keeves (1975) who considered multiple predictors of achievanent, and 
both leauming .^uid interest in science. The effects of father's occupa- 
tion/ parental aspirations for the child, parent involvanent in the 
school, and general SES level on science attitudes and interests, and 
general cognitive achievement in science were investigated. His study 
was based on a randomly selected sanple of 215 Australian sixth and 
seventh grade students. Science lecxrning and interest were measurcxi 
by specially prepared attitude questionnaires and achie\^ement tests 
in science. 

In other studies, the most frequently used methods for oollect-ing 
hone environment information were student questionnaries (Neujahr & 
Hansen, 1970; Uasan, 1975; Stronck; 1974) and the use of school archives 
(Hardy, 1970; James & Pafford, 1973; Wynn & Bledsoe, 1967). Three 
studies failed to report methods used for securing hone data (Bart, 
1978; Ashhaugh, 1968; and Troost, 1969). The reliabilities of measures 
used in these studies is seldon reported. 

Peer Environment 

Of the five studies considering the effects of peer environment 
and science learning in Tabic 4, three were concerned witli the effects 
of within class grouping on cognitive science learning: i,e . , with 
the effects of individual vs. group wDrk (Gcibel & Herron, 
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1977; Linn et a3., / 1977), eind hcmogeneous vs. .heterogeneous al:>ility groupings 
(Bicak, 1964). 

Of these three studies, only Gcjble and Herron show a positive 
relationship between learning and peer environment. They report that, 
in their urban saitple (the study also included a rural sample) , group 
work had a positive effect on factual learning in general science. This 
result was not replicated in their crural sample. Bicak found no sig- 
nificant effect for ability grouping on the learning of science mate- 
rial in meteorology; while Linn _et al found no effects for individual 
/s. group work on the acquisition of logical operat:.ons. 

Insert Table 4 about here 

In his study of logical operations in urban adolescents, Bart 
(1978) reported a correlation of .25 for teacher ratings of students' 
"rapport with peers". Keeves (1975) considered the effects of friends', 
or peers' participaticu in scien*" ; and mathetr^atics activities on stu- 
dents' cognitive achievement and critical thinking in general science. 
He reported correlations of .23 and .24, respectively for these mea- 
sures . 

That the number of studies considering the <=*f fects of the peer, 
environment on science learning over the past fifteen years is so 
limited, is noteworthy. This is particularly so, in light of atten- 
tion previously given to peer influences on achievement in general 
educational literature (Coleman, 1961). Of note too, is the dDser- 
vacion thac none of the studies reviewed here considered sociological 
or extra.-Hcucricular asp€x:ts of the peer envircnment cn science achieve- 
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ment. They were rather, restricted to the consideration of peer influ- 

eces within the classroon, 'Ihat* peers exert considerable influence 

outside the school on curriculeo: choices and academic achievesnenl:, has 

been demostrated in previous research on adolescence (Bradley, 1977; ^ 

Spencer, 1976; Kandel & Lesser, 1969) • 
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Conclusions 

As the results of the literature search and selection undertaken 
in these construct areas desnfionstrate , science educators have paid 
little attention to student motivation, hone environment, and peer 
environment variables in the study of science achievanent.. Never- 
theless the consistent, positive direction of findings (±>served in 
studies of these constructs makes a strong case for their inclusion 
in future research.. Student motivation, and hone and peer environ- 
ment factors appear to be important correlates of science learning. 
They deserve closer attention from the science educator since 
academic achievement associated with th^se constructs is subject to 
environmental intervention, either through instruction, or- counsel- 
ing. 

The consistency and parallelism of results ciiserved in studies 
of student motivation and home environment with previous work in 
general education suggests the need for further direct investigation 
of these constructs, 'Ihe in'^orporation of such constructs as contn^l 
or stratification factors in curriculum and instructional research 
is reccnmcndcd: and the value of attcrmpLs to manipulate tlicsG constructs 
expcriinentally in the hope of making science-educaticn nore productive 
is indicated. 
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•Definition and Examples of Student Motivation, Home Environment, and Peer Environment Constructs 



Construct 



Student Motivation 



Definition 
Any measured intrinsic drive 
or extrinsic reward 
that influences student per- 
formance during an instruc- 
tional treatment or test 
situation. 



Example Measures 
Self-concept, persis- 
tence, need-achieve- 
ment, test anxiety. 



Home Environment 



Any characteristic of environ- 
ments over which a parent or 
guardian exerts direct control 
as opposed to classroom or peer 



group environment. 



Parent occupation (SES), 
presence of science- 
related equipment and 
documents in the home, 
parent involvement in ' 
school work. 



Peer Environment 
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Characteristics of the students' 



'beliefs, practices, and 



social activities associated with 



peer group beliefs and practices, 



Ability tracking (between 
classes) , school activi- 
ties (extra-curricular) , 
instructional grouping 
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(within classes) , 
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Student Motivation and Science Learning Studies: Subjects, Features, and Findings 



Author (Date) 



Subjects 



Feature 



Finding 



Sign/: 



Alvord & Glass 
(1974) 



Bart (1978) 



Mancini (1972) 



3162 grade 4,7,12 
students 



285 urban high 
school students, 
aged 13-19; hetero- 
geneous racial, 
ethnic and SES 
backgrounds 

267 suburban grade 
grade 7 students 



Academic achievement in 
science as measured by 
NAEP tests, and self- 
concept 

Adolescent formal reason^ 
ing and teacher's evalua- 
tion of task persistence 
(peer and home environment 
also considered) - 



Self -concept of academic 
ability, and achievement 
in biology 



Positive correla- 
tion between achiever 
ment and self-concept 



Positive correlation 
between formal 
reasoning and per- 
sistence 



Students with higher 
self -concept, evi- 
denced higher achieve- 
ment 



+ .16 



+ .26 



Raven S 
Adrian (1978) 



Soh (1973) 



249 grade 9-11 
rural, average ' 
and above average 
students 



170 high ability 
second year male 
students from 
English Gramma:: 
Schools 



General science achieve- 
ment, and general self- 
concept of ability and 
concept of ability in 
science 

Comparison of the moti- 
vational orientations 
of students with, and 
without career interests 
in science 



Positive correlation 
between achievement 
and general and science 
self-concepts 



Students with greater 
preference for science 
careers, evidenced 
higher achievement mo- 
tivation 



+ .47 



+ - Positive relationship between construct variable and science learning 



^ = Negative relationship between conkruct variable and science learning 
ERJCjian' of reported cOErelations 
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Author (Date) 



Table 3 



Home Environment and Science Learning Studies: Subjects, Features, and Findings 



Subjects 



Feature 



Finding 



Sign/ r 



xy 



Ashbaugh (1968) 



430 grade 4-6 students 
from upper middle 
class suburban commu- 
nities 



Attainment of geological 
concepts and SES 



No differences in learn- 
ing as a function of 
SES level 



Bart (1978) 



285 urban high school 
students, aged 13- 
19; heterogeneous 
racial, ethnic, and 
SES backgrounds 



Adolescent formal reason- , 
ing and parent involve- 
ment in the school, 
parent aspirations for 
the child, and SES (student 
motivation and peer en- 
vironment also considered 



Positive correlation 
between formal reason- 
ing achievement and 
home environment 



+ .30 



Hardy (1970) 



208 chemistry students 
104 enrolled in CHEM 
.study, 104 in tradi- 
tional chemistry 
courses 



Critical thinking and 
performance on standar- 
dized achievement test 
correlated with SES 



Positive correlation 
between SES and critical 
thinking and achievement 



+ .26 



Hasan (1975) 



340 grade 11 Jordanian 
students 



Student interest in science 
and parents' education (SES), 
and parent aspirations 



No differences in interest 
as a function of parents' 
education, but positive 
relationship found between 
science careers desired by 
parents and student science 
interest 



-A 



James & 
Pafford (1973) 



02 



84 grade 12 students 



Student interest in 
science and father's 
occupation (SES) 



+ = Positive relationship between construct variable and science learning 
- = Negative relationship between construct variable anol Fcience learning 

ERIC , . . , 

™»™jian of reported correlations 



Students with professional 
fathers elected more science 
courses than those of non- 
professional fathers 



103 



E24 



J 



Social Psychology and Science Learning 



Author (Date) 



Keeves (1975) 



Subjects 

215 Australian grade 
6-7 students 



Table 3 (continued) 

Feature 

Fathers occupation, parent 
aspirations, parent involve- " 
ment in the school, and gen- ^ 
eral SES leval; ^nd general 
science achievement and science 
attitude (also considered stu- 
dent motivation and peer en- 
vironment) ; 
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Findings 

Positive correlation 
between home environ- 
ment indices and achieve- 
ment and attitudes 



Sign/ r 



xy 



.3^ 



Klein (1971) 310 grade 6 students 



Lynch et al. 
(1979) 



1635 grade 7-10 
Australian students 



General science learning and 
SES 

Performance on a test of fac- 
tual science. learning, and SES 



Positive correlation be- 
tween achievement and SES 

Postive correlation between 
test' performance and SES 



,14 



Neujahr S 
Hansen (1970) 



194 students from a 
high school science 
honors program 



Students interest in science 
(as evidenced by subsequent 
academic work in science) ; 
and fathers' occupation (SES), 
and presence of science equip- 
ment in the home 



Positive cprpelations be- 
tween interest and home en? 
vironment indices 



.17 



Quinn & George 176 grade 6 students 



(1975) 



from urban, and sub- 
urban schools 



Performance on a hypothesis 
formation task, and SES 



No differences in perfonnance 
obsened as a function of SES 



Stronck (1974) 700 grade 10 - 12 
students from Texas 



Performance on a statewide 
scholarship test of general 
science learning, and SES ' 



Positive correlation between 
test performance and SES 
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Author (Date) 



Troostd969) 



Sub jects 

54 grade 7-9 stu- 
dents of diverse 
ethnic origin 



Feature 

Achievement following 
a summer program in 
space science, and SES 



Sign/ 



F indings- 

Positive correlation 
between achievement 
and SES 



xy 



.21 



Wynn & 

Bledsoe (1967) 



325 urban, grade 
11-12 students 



Students' interest in 
science and SES 



No difference in in- 
terest found as a func- 
tion of SES 
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Table 4' 



Author (Date) " Subjects 

Bart (1978) 285 urban high school 
students, aged 13-19; 
heterogeneous racial, 
ethnic and SES- back- 
grounds 

Bicak (1964) 77 grade 8 students 



Gabel & Herron 
(1977) 



Keeves (1975) 



Linn^ al, 
(1977) 



1022 grade 7 JSCS stu- 
dents from county and 
city schools 



215 Australian grade 
6-7 students 



132 grade 5-6 stu- 
dents in a lower 
middle class urban 
school 



Feature 

— ' 'J 

Adolescent formal reason- 
ing and teacher's evaluation 
of rapport with peers (stu- 
dent motivation and home en- 
vironment also considered) 

Homogeneous vs. Heterogeneous 
ability grouping on achievenent 
in a local science course 



Group work vs. individual 
work on retention 



Peer participation in science 
and math, and general science 
achievement and science atti- 
tude (also considered student 
motivation and home environ- 
ment) 

Individual work vs. elective 
group work on promoting stu- 
dents ' ability to control 
variables 



Findings 

Positive correlation ' 
between achievement and 
rapport with 'peers 



Sign/ r 



No differences between 
homogeneous and heterogene- 
ous ability groups in 
achievement 

Higher retention shown for 
city students working with 
partner. No differences 
found in the county sample 

Positive correlation be- I 
tween peer environment, 
and achievement and atti- 
tudes 



No differences in achieve- 
ment for individual and 
elective group work 



It 



.25 



V- 



.24 



+ = Positive relationship between construct variable and science learning 
- - Negative relationship between construct variable and science learning 
a = Median of reported correlations 
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Stem and Leaf Diagrams of Individual and Study-Median Correlations For 
Student Motivation, Hone Qivircnreit, And Peer Envircnment Construct Variables 



Studait Motivation 



JNDMDDMi MEDIAN 



.6 






.6 






.5 


68 




.5 






.4 


7 




.4 




7 


.3 


6 




.3 


13 




.2 




6 


.2 






.1 


57 


6 


.1 






.0 






,0 







Here Enviicmant 



INDIVIDUAL MEDIAN 



Peer Bivironment 



INDIVIDUAL MEDIAN 



.6 








.6 


.6 








.6 


.5 


5 






.5 


.5 


03 






.5 


.4 


9 






.4 


.4 


0 






.4 


.3 


5 


5 




.3 


.3 


02 


0 




.3 


.2 


5668 


6 




.2 


.2 


01 






.2 


.1 


677 


7 




.1 


.1 


4 






.1 


.0 

.0 








.0 
.0 



5 
34 



Mean = .36 
Median = .33 
Sd= .15 



.30 
.26 
.13 



.24 
.24 
.01 
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Abstract 

The purpose of this stud/ was to quantitatively synthesize quality and 
quantity of instruction studies with the same or similar independent 
variables^ in the published science education grade 6-12 research of 
the 1963-1978 period. Fifty-tw3 studies formed six clusters and re- 
vealed significant positive cognitive outcomes due to the use of prein- 
instrix^ticaial strategies, training in scientific thinking, increased 
structure in the verbal content of materials, and increased realism or 
concreteness in adjunct materials. In general, systennatic innovation 
in instruction was found to produce positive iiiprovanents over the norm 
or traditional practice, ^4ethodologically_, improved research design 
quality was related to larger effect sizes. Recanmendations are made 
regarding replication, use of multiple measures, attitudinal research, 
i:se of general education findings, and the reporting of research. 
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Instruction and Science Learning: 
A Quantitative Synthesis 

Research on the quality of instruction is extensive, diverse, carplicat- 

ed and often inconciiasive. Reviews of hundreds of studies have resiiLted in 

disappointment expressed by many reviewers in what they interpret as a lack 

! 

of substantive research in the quality of instruction and its influence on 
student learning (Travers, 1973) . Yet other reviewers, using quantitative 
synthesis techniquies, have found positive eirpirical support for the influence 
of several factors on learning. Bloan (1976) identified instructional cues, 
participation, and reinforcement as accounting for up to 25% of the variance 
in student learning. Rosenshine (1979) sumnarized the work of several mjor 
researchers and found evidence for instructional time, content coverage, and 
direct instruction strategies as major influences on lisaming. Walberg, Schiller 
and Haertel (1979) tabxilated the resiiLts of recent reviews on the relation of 
instructional and other educational conditions to learning outcones and found 
a number of consistent, positive results. 

One reason for the differing views on the sumrative findings in a given 
area of research is the qualitative character of attenrpts at research synthe- 
sis, long narratives citing stuc^ after study prcf^/ide little basis for objec- 
tive caipeirisons and accumulation of results. If stucty characteristies and 
outcones could be quantified, research synthesis might gain new precision and 
objectivity, providing a finer measure of v*iat is known as well as a better 
knowledge of the gaps and flaws in the accumulated Research. 
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Based on theoretical ccnsiderations and the accumulating errpirical evidence, 

Walberg (in press ) developed! a productivity model incorporating eight constructs as 

\ 

major factors in student learning. The constructs are: student ability, motiva- 
tion^ and age or developmental level; quality and quantity of instructicn; and 
classroan, heme and peer environments. Using quantitative research synthesis 
techniques estimates of the size of the contributions of each ccaistruct 

to general letoiing outccmes were prepared (. Haerfeel, Walberg and Haertel, Note 1; 
Iversoi and Walberg, Note 2; Ujuroglu and Walberg, in press) . ihe productivity 
model provides a framework of constructs kncwn to be important factors in gai- 
eral learning and, therefore likely to be irrportant in science learning. Yager 
(1978) identified the need for reviews of science educaticn research and guidance 
fron the findings of general education research as national priorities for 
science ed\x:aticii. The present study was , conducted to meet these needs by quan- 
titatively synthesizitiq the science education research on learning for two of 
the constrticts, the quality and quantity of instruction. 

Purpose and Method 

The purpose of the present study was to quantitatively synthesize the published 
science education quality and quantity of instruction research performed with subjects 
grades 6 through 12 over the 1963-1978 period. This period and grade range ' 
werechosei to include the recent growth in research and curriculum develop- 
ment with the precollege students enrolled in the range of general to ^ecia- 
lized science courses. A quantitative approach to the synthesis was chosen 
to provide catparable indicies of the characteristics and outcomes within 
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\ 

and amcng homogeneous groiips or cliasters of studies. Uie qxiantitative techniques 
of research synthesis advocated by Glass (1978) are employed. . Quantitative ^nitheses 
is intended to cuinplcnent traditional qualitative syntheses sucli as the annual 

• Summary of Research in Science Education, e.g. , Petersen and Carlson (1979) . 
. Oiantitative techniques require multiple sti:idies relating the same or similar 
variables in terms -of ccnpariable. statistics such as signs, effect size?, and 

/ correlaticns. 

- Literature Search and Selecticn 

One of the most diff ictiLt tasks in research synthesis is deciding vAiat 
constitutes similar studies suitable for integration. Quality of instruction 
is a multi-dimensicnal construct enconpassing many definitions and points of 
view. Rather than defining the construct a prior , it was decided to let the 
body of science education research define it through a sirtple count of inde- 
pendent variables teceiving the most attention in experimental research on 
science instruc:±ian. The prinary source of literature references was the 
collection of ERIC science education bibliographies and annual reviews. 
This canbined with a scanning of all studi3s in the two major research jour- 
nals in science education. Journal of Research in Science Teachii^ and -^cioice 
Ed ucation , resulted in the identification of 137 published studies in the 
quality construct and 3 on the quantity of instruction, 2 published and 1 
dissertation. (Hie quantity of instruction studies will be discussed later. ) 
Ninety-five of the quality of instruction studies involved an instructional 
situation manipulated in an experimental fashion and learning outocmes mea- 
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,sured. The acaditicnal 42 studies were curriculum ccnparisons and, due to the 
poorly defined nature of the treatments, were eliminated from further conside- 
ration. The 95 studies were categorized by independent variables and the cate- 
gories and frequencies tabulated ■ (see Table 1) . 

Insert Table 1 about here 



A miniinum of five studies was set as the criterion for inclusian of an 
independent variable or clustering of closely related independent variables 
in the synthesis, since the binanial prcbability of five independent studies 
having the same outcane direction is less than .05. This criterion would 
allo^ a strong test of the effectivaiess of one treatment over another. For 
example, if the treatment group receiving indirect instruction achieved a 
higher mean score than the direct instruction group in five out five inde- 
pendent experiments, this would be accepted as strong evidence for the gei-,, 
eral superiority of indirect instruction. 

Applying the above criterion, six clusters totaling 52 studies were 
identified: preinstructicnal strategies, indirectness of instruction, induc- 
tive vs deductive strategies, training in scientific thinking, structure in 
the ve3±>al contait of materials, and realian or concreteness in adjunct mate- 
rials. ' Table 2 gives cluster cattponent variables, operational definitions, • 
and niairber of studies. 
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Insert Table 2 about here 



Coding. 

A numerical coding schene for study variables (characteristics) was 
developed prior to stu^ selection and refined as stix3y coding progressed. 
Eadi oorrparison of treatment means in each study was coded according to 
approximately 40 study variables: dependent measure type, origin^ 
and reliability; subject grade, sex, ethnic group, and acadanic achievement 
level; caimmity SES and urban-rural character; subject matter of treatments 
and sources of curriciiLum; constructs measured other than quality of in- 
struction; treatment characteristics including group size, -elective or 
required course participation, regiiLar or special teacher, lab or non-lab 
focus, reliability of inplementation, length, and equality of control group 
access to content; study design and nine categories of threats to validity; 
sample size; and outcone statistics, ie. direction of effect, level of signifi- 
cance and effect size. 

Effect size is a normalized measure of the difference between two 
treatment giroups in performance on a dependent measure. Nearly all effect 
sizes were canputed using one of the folloving two formulas (Glass, 1978) : 



ES = ES = t 



H^l ^2 



Xe and Xc represent experimental and control group means respectively. Sc 

is the standard deviation of the control group, t is the canputed t-test 

statistic. If an F-test were used in a one-^y analysis of variance to ccm- 

2 

pare two groups, the F value was considered equal to t . If oily the total 

sample size was given, it was assumed that n^=n2/ since equal n's provide a more 

>■ 
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conservative estinnate of effect size than the unequal n's. Finally, in cases 
where one-way analysis of variance vafe used, hanogeneily of variances was 
assuned setting Sc ="^| M9w, TWo-vay analysis of variance tables, however, 
without other statistics,' were insufficient for oanputing effect sizes. 

Each dependent variable in each study was placed in one of four ca1:e- 
gories. 

1. Factual learning (recall, recognition of treatment content; ret- 
ention test) 

2. Conceptual learning (concept attainment, science processes or log- 
ical operations, critical thinking; standardized achievement test.) 

3. Attitnd'inal learning (any affective measure of opinion, attitude 
or interest.) ^ 

4. Laboratory performance test- 
Methodological flaws (Cook and Cair^fcell, 1976) were examined and coded as 
either 1) "potential threat" or 2) "adequately minimized. " Flaws examined 
were: reliability of treatment; statistical power; error rate; naturahion; 
history; selection bias; contamination, ccnpensation or differential incen- 
tives; mortality,* and generalizability. T simple sum of these ratings yielded 

an over all index of design quality. 

Given the wide range in the nun±>er (1 to 11) of cotparisons in different stud- 
ies, and given the limited number of studies in any one cluster, it was 
decided to use the median effect size fron each study in each outcome cat- 
egory. Hie median effect size has the advantages of greater stability than 
the mean and meets the critician of lack of irtdepeidaice when multiple e£^ 
. feet sizes are drawn fron the same stucfy. The 52 quality of instruction 
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studies yielded 160 raw comparisons v*iich reduced to 69 median catparisons. 
(A few studies were useful without conputable effect sizes; therefore, there 
are a few more conparisons than effect sizes) . Based on a snail sanpling 
of studies read independently by two raters, 90 percent agreement between 
raters was readily attained in coding the 40 study variables. Hie appendix 
contains a bibliography of all studies by cluster. Abstracts of each study^ 
a code book, code sheet and a table of coded values are available in the 
project final report (Walberg, Boulanger, Kremer and, Haertel, Note 3) . 

Analysis and Discussion 

With the oanpletion of coding it was apparent that many study variables 
were not available in the study r^)orts (ie. subject ethnic group and cantu- 
nity SES and urban-rural character) or were constant across studies (ie. 
mixed sex of.i sample and local origin of the treatment) and would, therefore, 
provide little help in identifying sources of variation across studies. 
Cnly study variables adequately reported and with non-constant, variable val- 
ues were considiered in the analysis. 

Across all studies, the distribution of median effect sizes in d^endent 
variable categories was: 38 conceptual ,14 factual, 4 attitudinal, and 5 
laboratory performance outcones. Since the trends in size and direction of 
the factual outcomes conformed closely with the conc^tual outcones in any 
given cluster, and given the great overlap in content of factual and conceptual 
measures, the two outcome categories were cottoined into one category 
named cognitive outcone. The number of positive ccnparisons and the mean of 
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median effect sizes, for cognitive outcanes in each cluster and associated 95 per- 
cent confidence interval for each mean are surntiarized in Table 3 and discussed 
below in terms of trends in other coded study variables. Later, tlie entire set ' 
of quality of instructicn studies will be analyzed and discussed. Reference to 
"significance" in the following sections refers to statistical significance at 
the .05 level. 

Insert Table 3 about here 

. 1 ) Preinstructicnal strategies. Three subgroup ings of studies form the 
preinstructional strategies cluster: four studies on advance organizers, five 
^on behavioral objectiyes and two on set induction. Each included study com- 
pared the effect of the strategy with a carparable instructional treatment 
where no preinstructional strategy or a placebo strategy was used. 

Eight studies on a total of 1204 subjects resulted in a mean cognitive 
effect size of 1.03, significantly positive and favorable to the use of a 
preinstructional strategy. Seven of nine effect sizes (one study contributed 
two) were associated with significant differences, all favorable to the stra- 
^tejies. The strongest contributors to the large effect were studies on the 
use of behavioral objectives and set induction, with 5 of these studies hav- 
ing significant findings. Inspection of the two weakest effect sizes in this 
cluster of studies revealed that both originated fran the same source (Santi- 
esteban, 1977) a study with the shortest treatment length, less than 1 hour, 
of any study in the cluster. By contrast, the gr^te'st effect size (Olsen, , 
1973) resulted from a course length treatment on the largest sample in the 
cluster and with the highest design quality, rating. Examination of other 
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study variables indicated that the most effective strategies were conducted 
by trai ne d r^ular teachers using prepared materials with their own studoits . 
rather than materials used alaie without teacher intervention. 

2) Indirectness of instruction. IWo subgroupings formed this cluster. 
Qie subgroup of seven studies, called "non-direct versus direct," compared 
teacher or workbook controlled instruction with instruction allowing, by 
catparison, greater student choice in content and/or method. Ihe other 
subgroup (two ' studies) , called "indirect versus direct," used Flanders 
Interaction Analysis to monitor the degree of teacher indirectness in lec- 
ture-discussion settings. The learning of students of liigh indirect versus 
lav indirect teachers were cotrpared ex post facto . For coding purposes 
these later two studies were classified as qaasi-e35)erim^ts . 

Eight studies totaling 1135 subjects resulted in a mean cognitive efr 
feet size of .11, favorable though not significantly, to the non-direct ap- 
proach. Five of the 10 effect size^ yielded significant differences, three 
favorable to the non-direct or indirect, and two to the direct approach. 
These results .indicate no general tendency for one approach to be superior 
to the other. A trend was noted in the four positive effect sizes: all 
were from studies conducted in grade 10 or above. 

The two reported attitudinal effect sizes almost exactly cancelled 
each other for a mean of .002. The study (Canpbell, 1971) showii^ a sig- 
nificant effect size favorable to the indirect approach had the weakest 
desi^ quality for this. cluster, while the study (Kline, 1971) with the 
opposite outcome had the strangest design quality rating. Both studies 
were with required junior high courses. 
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3) Inductive vs. deductive strategies. This cluster of studies bears sane 
resemblance to the indirectness of instiniction cluster in that the deductive or 
expository strategies always involved a greater dz^jree of teacher and/ or printed 
inaterials verbal directness in the instructional process. Uiis cluster differed 
fran the indirectness cluster in that the sequencing of instructional components 
in the two carpeting treatmoits always had the flavor of one being the reverse of 
the other, e.g., frcm rule-to-example cotpared to from example-to-rule. Thereto 
were no subgroupings of studies/in this cluster. 

Seven studies with cdgnitive outcaues gave a meain effect sij^e-of -.z2 
favorable to the deductive strategy. In terms of direction of effect, seven 
of nine comparisons favored the deductive strategy, but only one was signifi- 
cant. The two largest effect sizes (Babikian, 1971 and Thomas, 1969) both 
involved regular teachers using prepared meterials with their own student in . 
8Lh grade requii'ed science courses. The atronger of the two studies (Babi- 
kian: a- taie experiment with higher design quality ratii^) yielded tlie high- 
est and only significant effect size. However, a stiody (Tanner, 1969), corr- 
parable to the highest effect size study in many respects (true experiment 
with regular 9 th grade teachers in required course over similar treatment 
length) , but using materials only (no teacher intervention) and with the 
highest design quality rating in this cluster resulted in no significant 
differences on conceptual outcomes. As in the case of indirectness, cluster 
studies, the mean effect size was not significantly different from zero and 
no conlusion can be drawn about the superiority of one approach. 

fl Catparing tlie inductive vs. deductive cluster with the indirectness 

cluster, there was evidence of a continuation of the pattern suggested 
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earlier, namely, that one teaching strategy may be more effective with upper 
grade students, viiile the other stategy is more effective in lower grades. 

Figure 1 is a scatter plot of all ccqnitive outcone effect sizes for 
both the indirectness of teaching role cluster and the inductive vs. deduc- 
tive cluster against grade level. The correlation (r = .48) of effect size 
with grade level is significant. The trend is worthy of further research. 

Only one attitude outiccme was reported in this cluster. It favored the 
deductive approach but was non-significant. 

4) Training in scientific thinking. IWo subgroups formed this cluster, . 
seven studies attempting to train subjects in sane aspect of Piagetian related ' 
logical operaticns and two studies of the effects of training in the processes ^ 
of science. The mean cognitive outcone effect size for the cluster, based on" 
716 subjects in eight studies, was .89 significantly positive and favorable to 
training students to use logical operations or procesises of science. Eight of 
the 11 median effect ^^izes were based on significant differences,, all favorable 
to the effectiveness of viiatever trairt-ing strategy was used in the study. Hew- 
ever, only one of the eight significant, differences was fim a study where the 
control group had equal access to the content being taught. 

Exacnining other study variables, the strong mean effect size is a clear 
statement that progress in scientific thinking can be made in a wide range of 
grade levels (grades 5-9), in relatively short treatment periods (2 to 10 hours), 
as part of required courses where a special teacher or special materials present 
carefully designed instruction to individual students. Cnly one study (Hcwe, 
1977) of the significant studies had the regular teacher working with a class 
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size group.. That study was a quasi-experiment .with a very low (10) design quality 
rating. No attitudinal or laboratory outcomes were reported. 

5) Structure in the verbal content of materials. This is the most tightly 
defined .cluster of the six discussed in this pc^er. All five studies in the 
cluster use Anderson's (1971) analysis and operational definition of structure. 
The operational definition takes the fontj of formulas used for coiputing certain 
structural coefficients based on a carefxol analysis of printed materials. In 
each study, the learning of subjects using high structure materials was catpared 
to subjects using lower stnxrture materials. The cognitive outcome mean effect 
sizevas .74, significantly positive and based on six effect sizes all favorable' 
(three significantly) to the hi^er structure treatment. 

The homogeneity of this cluster of studies is evidait in a brief exami- 
nation of study variables. All are ^ort (one hour or less) treatments in bio- , 
logy or life science, administered to individuals in true experiments where the 
control group has, with one exception, equal access to the content. All treat- 
ments are without teacher interveition, based only on printed (or audio taped) 
materials in, with' one exception, non-laboratory settings. All studies are of 
high design quality. 

^ One laboratory outcome effect size, 1.364 was; found. It was significant 
and in favor of the higher structure treatment. 

6) Realian or cancreteness of adjunct materials. Studies in this cluster 
have a canmon feature of conparing instructicnal treatments differing in their 
positions on the instructional materials ccncrete-synbolic continuTin, Carpari- 
sons might involv^lfnShipulative vs. pictorial materials, laboratory based vs 
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lecture based instiriiction or more ocmnonly, pictorial vs verbal presentations 
in printed matter. In coding each study, the experimental group was always 
the group receiving the more concrete or realistically illustrated instruction. 

..All nine studies with cognitive outcomes were favorable to the more con- 
crete or realistic instructional mode, yielding a significant mean effect size 
of .58 based on a total of 512 subjects. Five of the nine outcomes were sig- 
nificant with four of these from true e5q)eriiTient studies of high design qual-. 
ity. Six of the nine studies used instructional materials oily with no 
teacher intervention, seven were of short duration (less than ten hours) and 
seven involviad individuals working alone with the materials. Hie major ex- 
ception to these trends was a year long stirly (Yager, 1969) oonparing a lab- 
oratory based approach to a cotparable content, expository approach. This 
stu(^ resulted in one of the lowest, yet positive, effect sizes ( .131) . Over ' 
all, the evidence was strongly supportive of the value of realism .and con- 
creteness in adjunct instructional materials to teach conceptual content. 

Only one lab performance outcone (effect size 1.540) was reported. It 
was favorable to realism or concreteness , vdiile the one attitxadinal outcone 
(effect size - .848) was favorable to an expository over a laboratory approach. 
Both were from the Yager study. 

Conclusions: Quality of Instruction Clusters 

Based on the published science education research on subjects in grades 
6 through 12 for the period 1963 through 1978, conclusions about the iitpact 
on stiadent learning of certain aspects of the quality of science instruction • 
are stated here. 

1) Preinstructional strategies, especially the use of behavioral objec- 
tives and set induction but also advance organizers, can irtprove student concep- 
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tual learning when used with otiier instructional activities by classroon tea- 
chers. The mean effect size 1,03 is significantly positive* and is equivalent to 
an improvement of about one standard deviation (34 percentile points) when the 
treated group is ccmpared to a ccntrol group having access to the content of 
instruction but without the focusing of a preinstr\x±iona;l strategy, 

2) Non-direct or indirect instruction ccmpared to direct instruction 
resulted in no difference in the general effectiveness of one approach over 
the other, Ihis cluster of studies was characterized by design weaknesses and 
significant findings both for and against a given instructional strategy, 

3) The mean cognitive outcone effect size of -.22 though slightly frivor- 
able to the deductive over the inductive teaching strategy, must be accepted 
with caution siiice it is not signif iccintly n^xrative and only one of the ten 
studies reported significant differences betweoi the outccmes of the two strate- 
gies. As in the previous cluster, no firm general conclusion can be drawn regard-^ 
ing the effectiveness of cne strategy over the other, 

.4).. Vhen the indirectness cluster findings are ccntoined with the inductive 
versus deductive cluster findings, a pattern of effect sizes agadLnst grade 
level led to the conclusion tiiat deductive or direct instruction tends to be 
more effective in terms of cognitive outcomes with junior high level (grades 
6-^) students in required courses, while indirect, non-direct or inductive 
instracticn was more effective with senior high (grades 10-12) students in 
elective courses. 
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5) Training in scientific thinking, especially the use of logical opera- 
tions , is effective in terms of cognitive outcanes vvhen conducted on an indivi- 
dual basis by a special teacher, Qily two studies with significant effect 
si^es involved class size groups, one with the regular and one with a special 
teacher. The mean effect size for training .89 is significantly positive and 
equivalent to a 30 percentile point inprovement viien OOTipared to untrained 
control subjects. 

6) More highly structured verbal .content in printed or audio materials 

is more effective in pranoting cognitive learning than less structured content. 
The mean effect size .74 is significantly positive and equivaleitr to about 27 
percentile points between the low structure group and high stnacture group 
means. 

7) An insufficient number of studies were found reporting attitudinal . 
or laboratory cutcanes to draw any general conclusicns about v*aat aspects of ' 
quality of instruction have favorable or unfavorable effects. 

Cognitive Outccme General Trends 

Examination of the 57 comparisons of cognitive outcomes including 52 
median effect sizes provides sane insight into the general effectiveness of 
systematic innovation in instruction. All studies were coded such that the 
experimental treatment represented a d^arture from the norm or "traditional" 
instructional practirra. Twenty-three of the 57 coitparisons (Table 3) were 
significantly . positive viiile only three were significantly negative. The 
mean cognitive effect size was .55, significantly positive and favorable to 
experimental treatmoits. Ronoving' those corparisons. (14) where control group 
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access to content was less than the experimental groip slightly lowers the 
mean effect size to .51, still significant and equivalent to an improverrait 
of approximately 20 percentile points over a control group. 

^ The influence of study variables on effect size was investigated by 
canputing and ccmparing the effect sizes corresponding to various subgroups 
of studies. Oily studies with cotparable content access by both treatments 
were included. Nine major study variables, subgroups of values, correspon- 
ding effect sizes, and F-test results are shown in Table 4. None of the 
differences among subgroups is significant. Cne trend deserves noting: 
published outcone measures tended to yield larger effect sizes. 

Insert Table 4 about here 

Correlations between effect size and the study variables of sample size, 
grade level, and reliability of outcanne measiare were ccnputed and found to be 
-.02, -.10 and .00 respectively. All three were non-significant. . ; 

Methodological Quality 

An index to the general quality of the studies synthesized is the break- 
down .of design characteristics and threats to validity. Seventy-two percent- 
of the studies were true experiments using randan assignment of subjects to. 
treatments; 28 percent were quasi-ejqDeriments, 5 percent of these using match- 
ing. Based on references in the study reports to precautions taken to 
insure the treatments /were reliably implemented, 44 percent of the studies 
were judged to have low reliability of treatment iirplementaticn, 37 percent 
adequate reliability, and 19 percent higrh reliability. . This ^.t/^^akness-in design \ 

' 133 



F 19 



Instructicn and Science Learning. 



(or reporting) mi^t be remedied using a verification of treatments approach such 
as that described by Leonard and Lowry, (1979) • The percent of studies judged to . 
be probably flawed ty- other -.threats to validity is given here with the associiatdd 
threat: 65 percent, inadequate statistical power; 2 percent, error rate; 24 per- 
cent, mturaticn; 49 percent, history; 27 percent, selection bias; 59 percent, 
contamination, ccnpensaticn or differential incentives? 19 percent, mortality; 
and 34 percent generalizability. Since no study involved random' selection of 
subjects from a larger, well identified population, the last percoit is a rather 
conservative estimate. 

The rather hi^ rate of threats to study validity might, call , into question 
the .51 over all cognitive outcome effect size reported earlier. To check the 
ralationship between design quality and effect size, the design quality index 
defined earlier was correlated with effect size for the 38 studies v*iere both 
treatment groups had equal access to the content of instruction. The corre- 
lation was .21 (p = .09) indicating a trend toward hi^er effect sizes with 
improved research design. As the number of design flaws diminishes, the dif- 
ference between experimental and control means- increases. 



Results, Analysis and Conclusions; Quantity of Instruction 

Three quantity of instrtcticn construct studies we r found by searching 
both the published and dissertation literature. Two of the studies were very 



of physical faience (Harvard Project Physics and Introductory Physical Science 
respectively) Keep tracK of their toted teaching days over set units of mate- 
rial. The unit test served as a criterion measure of cognitive adiievement. 




Welch's (1968) and Ecancmos' (1972). Both studies had teachers 
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Welch found a non-significant -.08 correlation^ between teaching days and Unit 
1 achievement based on the class means of 41 teachers. Eccnanos found tivo 
non-significant oorrelatiai^, .29 for Unit 1 and .17 for Unit Il^based on the 
20 class means of five teachers. 

The third study (Tomera, 1974) conpared* the effect bf two wedcs of trainir^ 
ih observation and catpariscn skills in seventh grade life science with four 
vfeeks of similar training (total n=80)^ After five mcnths, no significant 
difference in ability to use the skills was found. 

Taken as a whole, the three studies indicate that sirrply expanding the 
•anount of time spent on a given unit of material holds no special relation- 
ship to amount learned. Since how the time was spent in each classroom was 
not reported/ nothing about how to teach to a corparable level of, achieve- 
ment in a shorter period of time can be concluded, 

Sumntiary and Recanmendations 

The task of this stud/ wals to idoitify quality of instruction clusters 
of five or more studies of the same or similar independent variables in . 
the published science education grade 6-12 research of the 1963-1978 period, 
to quantitatively synthesize the studies within and across clusters, and to 
ccjiroent on the general quality of and gaps in the research. Fifty two of 
95 studies met the 5 study criterion and revealed significant positive oDg- 
nitive outcanes due to each of four types of instructional interventions: 
the use of preinstructicnal strategies, training in scientific thinking, 
increased structure in the verbal content of materials, and increased real- 
ism or con<:reteness ia adjunct materials. Indirectness of instruction and 
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inductive strategies showed no effect in general over direct or deductive 
strategies, but a trend toward more effectness of the indirect or inductive 
approaches in grades 10-12 and direct or deductive approaches in grades 6-8 
was found. Gonbining the results of all clusters, systenatic innovation in 
instruction resulted in significantly positive improvements over the norm or 
traditional practice. 

Methodologically the research was judged particiilarly weak in reliability 
of treatment irrplementation and particularly vulnerable to threats of history 
and contamination, conpensaticn or differential ^ incentives. Improved 
design quality was related to larger effect sizes. 

Certain recatmendation evolve fran the findings and the general experi- 
ence of conducting this kind of research synthesis. 

1) The r plication of studies is important but the replication need 
not rigorously fPllcM in detail an earlier stixiy. All studies are flawed 
and limited in seme way. Variatioi in flaws and strengths as well as in sub- 
ject popiiLation can add to the generalizability of the cummilative results. 
To be useful in a practical sense, instructicnal interventions must be suf- 
ficiently robust to give positive results under a variety of -less than opt- 
mal situations. 

2) Several constmcts, besides the quality of instruction, compete in 
explaining science learning. More of these constructs should be measured 
and brought into the analysis, e^ecially in quasi-experiments . Even experi- 
mental designs would be improved if. such factors as ability, motivation and 
classroom environment factors could be statistically removed and not 
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assumed to be neutralized by random assigntioit. This multivariate approach 
vould also allcw a better accounting of the sources of variance in outcomes 
and therefcjy lead to better prediction and control. 

3) Resiearch on the attdtudinal irtpact of various instructional inter- 
ventions is needed. Routinely,, studies should consider multiple outcomes 
on both an iinnediate and long term basis. Few studies had delayed follow- 
up measures of any kind. 

• 4) Findings from general education research should inform science 
education research. For exanple , .the research on direct instruction tech- 
niques (Rosenshine, 1979) in lower grades diould be examined and "applied in 
science lessens to' determine its limits of effectiveness. 

5) Study reports should typically include the means and standard 
deviations of all treatment group outconae measures to make future quan- 
titative syntheses possible and easier. Also, the qeneralizability of 
individual studies as well as future syntheses would benefit from greater 
attention to the description of the populations represented by the sarrple 
of students actually iTeceivijig eaqDerimental treatments. This should in- 
clude at least ccnmunity occupational composition, SES, and urban-suburban- 
rural character. 

6) More attention needs to be given to insuring the reliability of 
treatment implementation and minimizing associated threats to stucfy vali- 
dity. 
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Without a quantitative synthesis of the research, the findings of this 
study would have remained qualitative and directional at best. The quanti- 
fication of effect sizes and study variables has allowed a more objecbive 
and precise representation of the literature reviewed. .The relatively sniall 
number of studies in each cluster has meant larger confidence intervads nakr- 
^ing significance of the findings more difficult to attain^ but, where at- 
tained, more convincing. As the body of research literature grows, additional 
studies will form new data points in nev clusters, building toward confidence 
in the general pattern of research findings. 
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Table 1 

Categooty and Frequency of Published 
Quality of Instruction Studies 1963-1978 
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Teacher/Studeit Mo^es 


2 


rtoie of Illustration 


2 


Tea. vS Stu. Generalizaticn 


2 


Non-Direct vs Direct Inst 


5 


Teacher Experience 


1 


Open Ended Inst 


1 


Training in Log. Operations 


8 


Original Sources 


1 


Type of Discussion 


2 


Part vs Wiole Film 


1 


Verbal vs Picture Tiode 


1 


Pacing of Instructicn 


2 







Note; Each study related the independent variable to a measure of science learning. 
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Table 2 

Study Clusters and Definitions 

i 

Component Variables Operational Definitions 



Instruction and Science Learning 



No. of Studies-^ 



Prelnstructlonal 
Strategies 

Directness of 
Instruction 



Inductive vs. 
Deductive 
Strategies 



Training in 
Scientific 
Thinking 



ERIC 



Advance Organizers 
Behavioral Objectives 



Set Induction 



Direct vs. Non-Direct 



Indirect/Direct Ratio 



Same as cluster 



Training in Logical 
Operations 
Training In Science 



Processes 



Identified as such by study 



author 



Teacher or workbook controlled 
instruction compared to instruc- 
tion allowing greater student 
choice In content and/or method 
Used Flanders Interaction Analysis 
Sequence of instructional components 
In two competing treatments such that 
one proceeded from rule or generaliza- 
tion to examples while the other 
reversed this sequence 

Training in some Piagetian task related 
skill or logical operation 
Identified as such by study author 
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Table 2 (Continued) 




Cluster 


ConiDonent Variables 


nnprat* inn;^] Rpfinil'lAnc 

up CI QLIUIiai uCllillLIUIId 


no. or oLUuies 


otructure in the 


) 

Same as cluster 


Studies comparing higher with 




verbal Content of 




lower structure materials using 




Hdterials 




Anderson's (1971) definition 


• 






of structure 




ncQi 15111 or concrece^ 


jdiTie as Cluster 


Studies comparing treatments with. 


ft 

9 


ness in Adjunct 




adjunct materials at different 




Haterials 




points on the concrete-symbolic 








continuum 





1. Table I frequencies were- first estimates of potentially useful studies and may not agree with the final . 
number of studies listed, here. 
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Table 3 



Cluster 

Preinstructioial 

Strategies 
Indirect 

InstructLcn 
Inductive 

Strategies 
Training in 

Scientific., 

Thinking 
Structure in 

Verbal Content 

of Materials 
Concrebeness 

in Adjunct 

ftaterials 
Totals 



■ Number 
of Stidies 

10 



9 • 



51 



Cognitive Outcanes by Cluster of Studies 



n 
U 



10 



10 



12 



f'7 



Caparisons 

Positive Positive 
Total p < .05 



10 



10 



41 



7 



.23 



Effect Sizes 



Negative 
p < ,05 

0 



1.03 



,95'oonf. 
mean interval 



±.68 



2 10 .11 i.27 



7 ■ -.22 ±.25 



11 .89 1.59 



/ 

6, .74 1.27 



9 .58 ±.22 



52 .55 i.21 
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Variable 
Design' 

Grade Level 



Type of Course 



Student Ability 



level 



Conpcnent 
Manipulated 

Ebcperinental 
Treatment 
Teacher 
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Subgraps 
1, 2 & 3 



Table 4 , 
Study Variable Subgroup Compariscns 

Subg. 1 Subg. 2 

n ES 



1) Qiasi-E}{per 

2) True Exper 
6 through 9 
10 through 12 
Elective 
Required 
Ccmbijiation 
High 

2) Average 

3) Low 
Teacher Behavior 
Materials 
Combination 
Regular 
Special 

Materials Ghly 



9 .43 



23 .60 



12 .57 



5 1.08 



1 .84 



n ES 



29 .54 



15 .38 



13 .60 



24 .40 



21 .44 



15 .55 6 .64 

1 



Subg. 3 



13 .38 



9 .52 



16 .59 



17 .44 



F p 
.16 .69 



.83 .37 



.31 .73 



1.90 .16 



.31 .74 



.19 .83 
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\teu:iable 
Focus of 
Instruction 



Experijnental 
Treatment to 



Length of 
Treatment 



Source of 



Outcome Ife^sure 



I&ble 4 (Ccntinued) 



Subgroups • 
1, 2 &3, 

1) Ncn-Lab \ 

2) lab • 
CombijBtion' 

1) Mividials 

2) Siall Group 
Class Group 
Less than 1 hr. 
1 to 10 hrs. 
Greater than 10 hrs. 
local 
Published 



Subq. 1 Subg. 2 

n ES ■ n is 



18 ,50 



22 .46 



U .66 



29 .41 



16 .54 



4 .52 



12 .43 



9 .84 



Subq. 3 



n ES 



4 .48 



12 .61 



16 .47 

\ 



.02 .98 



.15 .86 



.31 .74 



2.41 .13 



Note:'' ES means mean effect size. 
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Figure 1 

Cognitive Median Effect Size with Grade 
Level of Studies in Two Clusters 



direct, 
deductive 
strategies 

+ 1.0 



+ .5 



Effect 
Size 



- .5 



-1.0 



non-direct, 
inductive 
strategies 



X 
X 



o 
o 



8 9 10 

Grade Level 



o 



11 



12 



Note: "o's" indicate indirectness of instruction clusters studies, 
"x's" indicate inductive vs deductive cluster studies. 
The correlation of Effect Size with Grade Level is .48. 
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Social-Psychological Environments and I'jearning: 
A Quantitative Synthesis 

During the past decade and a half, educational researchers 
and evaluators issued or published about one hundred reports 
concerning student perceptions of the social-psychological 
dimensions of their classroom group such as cohesiveness, satis- 
faction, goal direction, difficulty, competitiveness, and friction. 
Reviews of this work (Randawa & Fu, 1973; Shulman & Tamir, 1973; 
Walberg, 1974, 1976, and Moos, 1979) discuss theoretical, 
methodological, and practical issues and conclude that such 
perceptions are useful as independent, mediating, and dependent 
variables in educational investigations in natural settings. 
Much of the research shows that social-psychological perceptual 

I 

scales are reliable, and are sensitive to educational treatments 
such as qurriculum, teacher training, and instructional innova- 
tions, as well as to project efforts to increase teamwork, 
cross-sex, cross-ethnic-group cooperation, and similar group 
properties. Other work reveals that such perceptions reflect 
and mediate teacher and student characteristics and that they 
provide diagnostically-valuable profiles of classroom climate 
and individual morale. 

The focus of the present work is the predictability of 
end-of -course cognitive, affective, and behavioral lecirning from 
mid-course social-psychological perceptions, with. and without 
statistical control for beginning-of-course measures, ability. 
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or both. Even if constructive perceptions of the scJclal environ- 
ment are considered worthy ends in their own right} it is 
important to determine if they are positively associated with 
learning gains and outcomes. Consistent, positive associations 
indicate that it is unlikely that learning is being traded off 
for more constructive social-psychological morale; under certain 
assumptions (Walberg, 1976) , such associations may indicate 
causal connections between social-psychological perceptions and 
learning. 

It should, of course, be acknowledged that there are many 
educational/ psychological, sociological., and even anthropological 
approaches to the measurement or operationalization of the social- 
psychological environment, climate, or morale of classes and 
schools. Behavioral psychologists, for example, in the study of 
groups, have often emphasized the frequency of leader and member 
behaviors (Bales, 1950). One sociological tradition has analyzed 
the socio-economic and racial-ethnic composition of classroom, 
peer, and school groups (Coleman, 1961); and anthropologists 
have studied the cultural relevance of classroom speech and other 
interactions to learning in ethnographic accounts (Tikunoff, 
Berliner, Rist, 1975) . It seems premature and certainly beyond 
the scope of the present synthesis to analyze and integrate these 
somewhat disparate approaches; and the scope of the effort is 
therefore restricted to student ratings of their perceptions of 
social psychological characteristics of their classes and 
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schools/ a topic that includes a sufficient amount of quantitative 
information on the environment-learning relation (with statistical 
controls) while maintaining construct continuity of psychological 
constructs across the studies analyzed. 

Reviews of research reported in twelve studies of ten large, 
independent data sets show that student perceptions of classroom 
climate can account for significant variance in a variety of 
cognitive, affective, and behavioral learner outcomes. It has 
not seemed possible until recently, however, to summarize the 734 
correlations in these studies to determine, for example: 
Which perceptions are most predictive? What learnings are most 
predictable? And, how does the predictability vary across such 
factors as grade levels of students, subject matter, and 
methodological characteristics of the studies? Jones and Fiske 
(1953), Light and Smith (1971), Gage (1978), Rosenthal (1976) 
describe a number of quantitative teciiniques, which are employed 
here, to synthesize the quantitative findings across studies and 
to provide answers to such questions. -See Glass, 1978 , for a 
critical exposition.) As in quantitative summaries of empirical 
works in the natural sciences > the techniques are intended to 
provide estimates across investigations of: the consistency of 
observations or coefficients such as means, correlations, and 
regression weights; their average magnitudes and margins of 
error; and their boundaries of application. The present applica- 
tion draws on the techniques developed by Glass (1978) for meta- 
analysis, by Hosteller and Tukey (1977) for obtaining appropriate 
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error estimates when some of the data are not independent, and by 
ourselves for weighting independent data sources equally as well 
as estimating simultaneously the complete set of possible 
determinants of the correlation coefficients. 

Method ' 

Sample of Studies 

A search was made of Dissertation Abstracts / Education 
Index, Psychological Abstracts , Social Science Citation Index , 
and, since much of the relevant research involved science 
curricula, the annual summaries of research sponsored by the 
National AssQciation for Research in Science Teaching for the 
years 1963 through 1S77. On-going, unpublished studies known by 
the authors or those cited in recent works were also considered 
for inclusion. Studies were considered that involved naturalistic 
classroom settings, kindergarten through twelfth grade, and that 
reported simple, partial, and part correlations between student 
perceptions of social-psychological climate of their classes 
and end-of-cours^ learning. 

Many different qualities of the student-perceived social- 
psychological environments of classrooms have been quantified. 
Because of the difficulties involved in determining whether or 
not subscales of different instr\;iments measure the same construct, 
a single instriment, the Learning Environment Inventory (LEI) , by 
Anderson and Walberg (Note 1) , was designated the "anchor instru- 
ment" for the research synthesis, and correlations from various 
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studies were categorized as involving one of the subscales on the 
LEI. There were several reasons for the selection of the LEI as 
"anchor." First, the LEI incorporates a broad range of 15 sub- 
scales measuring different aspects of the learning environment 
and reflecting a broad conceptualization of social-psychological 
dimensions found in many social collectivities such as hospitals, 
prisons, and workgroups, corporations and fraternities (Insel 
& Moos, 1974). Second, the psychometric properties of the LEI, 
including the reliability and factorial purety of its many sub- 
scales have been thoroughly investigated (Anderson and Walberg, 
Note 1) . Finally, ten of the existing studies meeting all other . 
criteria for inclusion, employ the LEI itself, or instruments 
derived directly from the LEI. These are among the studies 
listed in Table 1. The search and selection procedures yielded 
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twelve investigations of ten data sets that report 7 34 correla- 
tions calculated from a combined total of 17,8 05 students in 823 
classes (Table 1) . The correlations from three studies of a single 
data set (Walberg, 1969a and b, and 1972) that explored 
predictability of. learning across units of analysis and statistical- 
control techniques were counted as a single data source in the 
main regression amtalyses as explained in a subsequent section. 
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Features of Studies 

Table 2 summarizes the key features of the twelve studies - 



Insert Table 2 about here 



Information on each study includes a description of the environ- 
ment measures used, the variables controlled, learning outcomes, 
and a brief statement of results. 

In general the environment measures employed have internal- 
consistency reliabilities between .41 and .86. No strong systematic 
relationship is evident between the reliability of the measures 
and the grade level at which data were collected. 

The outcome measures include not only standardized achieve- 
ment tasks, but affective and behavioral measures as well. Ten 
of the twelve included some cognitive measure of achievement, 
seven include affective or interest measures; and five employ 
a behavioral measure such as daily attendance. 

Nine of the twelve studies are statistically controlled. 
Most studies control for the corresponding pretest; four studies 
control for student aptitudes such as IQ, and two control for 
instructional variables such as teacher attitude. 

Although the focus of the present synthesis is the magnitude 
of the correlations for specific environment scales an' learning 
outcomes, the multivariate results for sets of environment 
scales in the last column of Table 1 may be noted. Seven studies 
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added sets of these scales to regression equations containing 
ability or pretest measures or both as controls, and reported 
the percentage increment in accountable variance. The average 
incremental variance accounted for on 19 learning outcomes is 
20 percent with a range of 1 to 54 percent. Thus, regressions 

containing control and perceptual variables account for large 

■ ■ . r 

amounts, in some cases, nearly all, of either the total or 
reliable variajjce in learning outcomes. 

Characteristics of Correlations 

Information on eight characteristics was recorded for each 
simple, part, or partial correlation: national location of the 
study; grade level and number of students; unit of analysis? 
type of correlation; type of social-psychological perception; 
outcome domain; and content area of subject-matter (Table 3) . 
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The continuous variables, grade level and number of students were 
grouped into class intervals to calculate frequencies and one-way 
analyses of variance, but were left in their full continuous 
precision for the regression analyses. The nominal variables 
such as location and unit of analysis were treated as categorical 
in the analyses of variance and converted to sets of binary (0,1) 
variables in the regressions, as explained in thq discussion of 
the results. 
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Most of. the variables listed in Table 3 are self-explanatory. 
However, a few deserve explanation. Unit of analysis refers to 
the level of aggregation used in the data analysis, i.e., student, 
sub-group, class, or school. With the exception of Moos and Moos 
(1978), and Bardsley (1976), all investigations employed the 
Learning Environment Inventory (LEI) in original or simplified 
or shortened form to reduce the 25 -minute time required for the 
full 105 items for the 15 scales. For example, Perkins (1976) 
and Talmage and Walberg (1978) employed the 45-item My Class 
Inventory, an adaption of the LEI for elementary school. Since 
the five scales on the Moos and Moos instriiments and adaptions of 
the LEI correspond closely, it was possible to code all scales 
to correspond to the LEI scales. 

Outcome domain refers to the type of criterion measure used. 
Criterion learning measures are coded as either cognitive, 
attitudinal or behavioral. Types of cognitive measures include 
conventional multiple-choice achievement tests and tests of 
understanding, critical thinking, and tests of formal reasoning. 
Attitudinal criteria include instruments such as interest measures 
and motivation and self-concept tests. Behavioral criterion 
measures include self-report activity inventories and absence 
rates. 

The last item coded for each correlation coefficient is the 
outcome content area. Each correlation is coded for one of the 
following subjects: general science, life science, physical 
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sciences, mathematics, social sciences, humanities, general 
achievement or miscellaneous subject areas v/hich are aggregations 
across several of these categories. 

Data Analysis 

In the analyses, the magnitude of the correlation of environ- 
ment scale and learning outcome is the dependent variable. The 
specific LEI scale, outcome domain, the sample size, unit of 
analysis, and the. other explanatory factors are the independent 
variables. Several analytical techniques were used, beginning 
with a tabulation of the signs of the correlation coefficients 
by expected direction, proceeding through one-way analyses of 
variance and culminating in a series of multiple regression 
analyses. In the regression analy^ses, correlations were weighted 
to equalize the contributions of the different studies. The 
rationale for these procedures and details of their execution 
are explained in the next section, but the v/eighting issue 
deserves discussion here. 

The 734 correlation coefficients ai'e by no means, statistically 
independent since they arise from only twelve studies and ten 
independent data sets. Table 1 shows, moreover, that the number 
of correlations taken from individual studies varies from 5 to 240. 
Weighting each correlation equally would give 48 times more weight 
to the latter study. For these reasons, special procedures were 
developed for the regressions to give each data set equal weight. 
Tukey's "Jackknife" procedure (Hosteller & Tukey, 1977; Glass, 
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1978) was used to obtain estimates of the regression weights and 
their standard errors which provided statistically valid tests 
of the significance of each predictor. 

Results and Discussion 

Directional Hypotheses 

The 15 LEI subscales include some positive and some negative 
characteristics of the classroom environment. The positive 
subscales are: Cohesiveness, Satisfaction, Task Difficulty, 
Formality, Goal Direction, Democracy, Environment, and Compe- 
tition. The negative scales are: Friction, Cliqueness, Speed, 
Apathy, Favoritism, Disorganization, and Diversity. In subsequent 
analyses the signs of correlations involving negative aspects of 
the classroom environment are reversed. Thus, the expected signs 
of correlations with all LEI scales, as coded for these analyses, 
are positive. 

From social-psychological research, Walberg (1969b) derived 
36 hypotheses concerning the direction of relations between 
selected LEI scales and learning criteria, namely that Cohesive- 
ness. Satisfaction, Task Difficulty, Goal Direction, Democracy, 
Diversity, and Environment »would be positively correlated with 
learning outcomes, and that Friction, Cliqueness, Apathy, Favori- 
tism, Disorganization would be negatively correlated with cognitive, 
affective, and behavioral learning. Data assembled for the present 
research synthesis permitted the testing of these hypotheses by 
tabulating the number of studies in which correlations of each 
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sign were found, for each scale. The prediction for pach of the 
12 scales gave rise to 3 testable assertions, namely that 
correlations between that scale and measures in each of the 
three outcome domains would have the predicted sign. The 
resulting 36 testable assertions were evaluated as follows • Each 
study employing a given comJoination of scale and outcome was 
examined to determine whether the preponderance of coefficients 
is positive, negative, or evenly split. The .three. Project 
Physics studies that explored the consistency of correlations 
across class and individual units of analysis and analytic 
techniques were combined since they arose from a single data 
set (Tables 1 and 2) . Ties were broken by assigning even 
splits the values pluis, then minus, then plus, and so on. A 
tabulation of the results shows that 31 of the 36, or 86 percent, 
of the signs support the hypotheses; and the binonimal probability 
of an even split in a sign test is less than .001. Three of the 
five disconf irmations concern the Diversity subscale, which shows 
negative relation with outcomes in all three domains, rather* than 
the hypothesized positive relation. 

Unweighted, Univariate Analyses 

Table 3 reports the means, standard deviations, and 
frequencies of the 734 correlations grouped according to each 
of the eight factors individually. These descriptive statistics 
and the F-tests for the correspdnding correlation-weighted, one*^ 

r 

way analyses of variance are intended to show trends, variations, 
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and frequencies in the correlations, for each of the eight 
factors separately, before presentJTig the results of the regres- 
sions. The latter combine all eight factors in a simultaneous 
analysis that controls each factor for the others and points to 
a few strong trends that summarize much of the variation in the 
magnitudes of the correlations. 

The uF-ratio for location of study is highly significant 
(F = 35.58, df = 3,730, p <.001). Talkie 3 shows that the 
correlations from studies in India and Canada are higher than 
those in the United States and Australia. This finding may be 
explained by reference to the specific studies from which these 
correlations were taken... The single study in India (Walberg, 
Singh, and Rasher, 1977) used only extreme groups of students, 
nominated as most and least studious in their classes. These 
subgroups were the unit of analysis. The single Canadian study 
CWalberg and Anderson, 1972) is distinctive in that class, 
rather than student, was the unit of analysis. Aggregation 
effects may have raised correlations from these countries since 
analyses of collectivities usually yield stronger correlations. 

The F-ratio comparing correlations by grade level is not 
significant (F = 0.10, df = 2,731), which may be due in part 
to the unequal group sizes which resulted from the collapsing 
of the twelve grades into only three levels (elementary, junior 
high, high school). In the regression analyses discussed below, 
the grade levels were not collapsed; and grade level proved to 
be a strong predictor of correlation size. 
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Grouping the correlations by sample size yields a signifi- 
cant P-ratio CF = 23.25, df = 3,730, p<.001). The absence of 
linear trend across the different sample size categories, however, 
may suggest that the apparent effect of sample size is attributable 
to V^^r iat ions . across studies in unit of cinalysis,. type of correla- 
tion coefficient, type of outcome measure used, or other factors. 
The regression analyses were employed to test these possibilities. 

With the exception of the mean correlation for "subgroups," 
the correlations show stronger relationships with larger units 
of analysis. Differences among the mean correlations with 

different units of analysis are clearly significant (F = 19.44, 

'J 

df = 3,730, p <. 001) . The anomolous value for subgroups may 

P. 

agaa:n be explained by reference to the peculiarities of the 
single study using this unit of^analysis, by Walberg, Singh, and 
Rasher (1977) . The method used to select a sample in this study 
may have given rise to unusually high correlation coefficients. 
Aside from the "subgroups" anomaly, the strength of the environ- 
ment-outcome relation, uncontrolled for the other seven factors, 
increases as larger and larger unifj are examined. 

No significant differences were found between simple, part, 
and partial correlation coefficients (F = .83, df = 2,731). In 
some cases, partialling out ability or pretest scores or both and 
analyzing adjusted scores may increase precision and raise the 
correlation. In other cases, this increase may be more than 
offset by the attenuation due to the lowered reliability of the 
adjusted scores . 
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Differences among aspects of the environment measured, are 
high,ly significant (F = 4.43, df = 14,719, p<.001).. The sub- 
scales which show the strongest relations to learner outcomes are 
Cohesiveness, Friction, and Satisfaction, all of which show 
average correlations of over .22 with outcomes. These are 
followed by Cliqueness, Difficulty, Apathy, Favoritism, Direction, 
Democracy, Disorganization, and Environment, with averages in the 
range from .16 to .12. The remaining four subscales. Speed, 
Formality, Diversity, and Competition, all show average correla- 
tions of less than .07. ' 

The analysis of variance for outcome domain showed signif- 
icant differences among the three outcome domains (F = 19.67, 
df = 2,731, p <.001). Higher correlations were observed with 
outcomes in the Cognitive domain -than with those in either the 
att'itudinal or behavioral domains. • 

The last of the eight one-way analyses of variance contrasts 
the eight content areas in which outcomes, were related to 
environments (F = 12.21, df = 7,726, p <,001). Table 3 shows 
•that the content areas- in which outcomes are most predictable 
from environments^ are mathematics and the social sciences, 
followed by general science, the physical scienc^vs, and the 
humanities. The category "general achievement" includes 
standardized test scores summed over several content areas. 
These indexes are all in the cognitive domain, and the relatively 
high mean correlation for thi.s area n\ay reflect primarily the 
exceptional reliability of such measures. 

1 " f ry ■ 
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Regression Analyses and Jackknifed Estimates 

0 

Before the regression analyses, the categorial variables 
were replaced by sets of binary variables. Location, for 
example, was recoded into, four variables, called "USA," "Canada," 
"Australia," and "India." Each of these variables was given an 
identifying value of ei<ther zero or one 'for each correlation* 
For a correlation computed on a sample from the United States, 
"USA" is 1, and 'Canada", "Australia," and "India" are each 0. 
For an Australian study, "Australia" would be 1 and the rest 0, 
and so on* If the values of three of these variables are known, 
the fourth can always be determined (is redundant) ; therefore 
only three need be entered in the regression, and the convention 
was adopted to omit the last value of each categorical variable 
in Table 3. The continuous variables, grade and sample size, 
were left in their full metric precision. 

In addition to recoding categorical variables, weights were 
introduced to equalize the contributions of the ten data sets 
to the estimates. Each correlation was given a weight propor- 
tional to. the inverse of the number of correlations from its 
study. These weights were scaled so that the average weight for 
each coefficient was 1-00; thus the sum of the weights is the 
number of coefficients in the sample. 

After the- recoding and dropping of one variable from each 
binary set, 3 3 variables were available for the regression 
analysis: 3 variables for location, 1 variable for grade level 
(not recoded) , 1 variable for sample size (not recoded) , 3 

/' 
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variables for "unit of analysis, 2 variables 'for correlation type, 
14 variables for subscales, 2 variables for outcome domain / and 7 
variables for content area. Consideration of variables representing 
interactions of LEI scales with sample and study characteristics 
could introduce still more variables. It would be quite unusual 
if all of these variables possessed significant power to predict 
the size of the correlation coefficients after controlling for all 
the other variables. The problem, therefore, was to decide which 
of the 33 variables and additional interactions should be included 
in the 'final regression equation. To,.screen the variables, a 
multi-stage procedure was employed, whereby weak or colinear 
predictors were successively eliminated. First, a run was made 
using all 33 predictors. Then those with F~ratios of less 
than 1.00 were eliminated, and a second run was made. Then all 
remaining variables with F~ratios less than 2 were dropped, and 
the remaining variables were used in a third run. Variables 
in this run which showed F-ratios of less than 4 were eliminated, 
and a fourth run was conducted. This run included only 18 of the 
original 3 3 variables. The regression with these 18 variables 
will be referred to as the reduced model. At this point, sets 
of product variables were introduced that measure the influence 
of interactions of significant LEI scales with grade level and 
unit of analysis. To determine which of these product variables 
possessed additional explanatory value, a stepwise procedure was 
employed, in which the 18 variables already identified were forced 
into the equation and product, terms were then entered one at a 
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time, using an F ratio of 4.00 as the criterion for entering each 
new variable. The final criterion corresponded closely to 
significance at the .05 level. A total of 32 variables were 
included in the final equation. In addition to the 18 variables 
in the last reduced equation, 14 cross-product terms, representing 
interactions, were introduced.- The final equation with 32 
variables will be referred to as the product model. 

Conventional significance tests computed for the regression 
coefficients assiime that the correlations are statistically 
independent. Since in all cases several or many correlations 
are taken from the same study, this assximption is not met. - 
Accordingly, the significance of the coefficients was estimated 
using the "jackknife" CMosteller & Tukey, 1977), assximing studies 
to be independent, but making no assumptions about the independence 
of two or more correlation coefficients taken from the same study. 
To apply the jackknife, the final regression equations for the 
reduced C18-predictor ) and cross-product (32-predictor) models 
were computed ten times, each time omitting all correlations from 
one of the data sets. These ten regression equations, together 
with the original equation were then used to obtain new, robust 
estimates of the unstandardized regression coefficients and their 
standard errors. For each of these estimated b-weights, a t-ratio 
was computed, on nine degrees of freedom (since there were ten 
data sets) . 

Table 4 presents the original and jackknifed estimates of all 
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regression coefficients for both the reduced and product models. 
Significance tests (t^-ratios) are also shown. In the 'leftmost 
column, the 18 variables included -^n both models are listed, and 
the next four coliimns give conventional estimates and t-ratios 
followed by jackknifed estimates and t-ratios for the reduced 
CIS-variable) model. The rightmost four coliimns give the same 
information on these variables for the product model. For the 14 
crossproduct terms in the product model, only the jackknifed 
estimates and t-ratios are presented. These appear in the note 
at the end of the table. 

Reduced rdodel 

As* sho'vn in Table 4, jackknifirig showed 10 of the 18 coeffi- 
cients from the reduced equation and 15 of the 32 coefficients 
from the product model to be significant at the .05 level.- The 
jackknifed estimates are similar to the original regression 
estimates; but the t values are somewhat lower on average, 
indicating that the significance levels of the original estimates 
are somewhat inflated as a rdsult of the non-independence of 
correlations from the same study. The majority of the independent 
variables in the jackknifed reduced equation, however, are 
significantly related to the magnitude of the correlations between 
LEI scales and learning outcomes. 

Among the variables representing location of the study, USA 
had a jackknifed b-weight significantly less than zero in the 
reduced model. Holding other things constant, correlations taken 
from studies conducted in the United States are estimated to be .22 
less than those from studies abroad. 
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Grade level shows a small but persistent positive relation- 
. ship to correlation size; the average correlation rises roughly 
.05 per year. Older high school students have longer school 
experience, usually attend more classes each week, and may thus 
be more astute raters of the class learning environment. 

With other variables in the reduced equation held constant, 
a clear trend appears in the size of correlations as a function 
of the unit of analysis. Although only the variable representing 
•class as unit-of-analysis is significant by the jackknife procedure, 
t-tests for student and subgroups as unit are nearly significant 
at the .05 level, and reveal, in the context of the other variables, 
a monotonic increase in the magnitude of the correlations with 
i acreasing aggregation from student to subgroup to "class to school 
as unit of analysis. 

The strength of the LEI scale-outcome relation was found to 
be significantly higher for seven of the scales than for the 
others. This is shown by the t-ratios in Table 4 for the reduced 
model, jackknifed estimates, for Cohesiveness, Friction, Satis- 
faction, Favoritism, Goal Direction, Democracy, and Environment. 
Correlations of these scales with learning outcomes are estimated 
fay the regression to be from .21 to .38 higher than for the other 
eight scales, when all other factors are controlled. 

It is notable that type of correlation (simple, part, or 
partial) is not significant in either the one-way analyses of 
variance or the regression analyses. Differences are statistically 
undetectable between (1) simple correlations of percepti^r'.l scales 
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and learning outcomes and (2) part or partial correlations controll- 
ing ability, cognitive, or affective pretests, or both. One 
possible explanation for this finding is that the unreliability 
of adjusted scores may compensate for increased control for 
appitudes. However Anderson and Walberg (1974) show that IQ 
contributes little to the prediction of adjusted gains in cogni- 
tive, affective, and behavioral learning in several data sets 
■ whereas LEI scales contribute substantially. Thus, correlations 
may be unaffected by statistical controls because the scales in 
fact measure determinants of- learning that are independent of 
aptitudes and pr^stests. 

It is also notable that learning domain is not significant: 
the correlations of perceptions and cognitive outcomes do not 
differ significantly from those involving affective and behavioral 
learning outcomes. Thus, it appears that constructive aspects of 
class morale are equally associated with outcomes in all three 
domains rather than being associated with benefits in one domain 
sacrificed for losses in another. 

The reduced model in Table 4 may be used to estimate the sizes 
of correlations tor be expected under specific conditions in future 
research. This is done by adding together selected coef f ;.cients, 
or multiples of coefficients. The coefficient for the constant, 
.42, is always included. This value alone is the estimate for a 
correlation from a study not in the United States, not in Austra- 
lia, at the kindergarten level ("grade 0"), with school as the 
unit= of analysis, and not involving any of the twelve LEI scales 
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for which binary variables were included in the reduced model. 
To estimate correlations at higher grade levels, the coefficient 
for grade level, .05, times the grade CI through 12) is added in. 
Coefficients for binary variables are added in if the corresponding 
conditions obtain. For example, the estimate of a correlation 
at the tenth grade level, in the United States, with student as 
the unit of analysis, involving the satisfaction subscale would 
be .42 + 10 X .05 - .22 - .82 + .38, or .26. Further illustrations 
appear in Table 5. Least confidence can be placed in the estimates 
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of correlations for elementary grades since only two of the ten 
data sets were obtained at this level CTables 2 and 3). Extreue 
caution must be taken in extrapolating estimates beyond the data 
ranges given in Tables 2 and 3; caution is also required for inter- 
polated estimates within sparsely sampled data regions such as 
those for India, elementary and junior-high grades, and part 
correlations (Table 3) as well as subrspaces of these regions, 
which would require additional empirical investigation to enlarge 
the areas of confident estimation. 

Product Model . 

Entering significant interactions of grade ."'.evel and of unit 
of analysis with LEI scales changes the magnitude and\^ignif icance 
of the b-weights but not their sign CTable 4). For example. 
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the astimateH. b-weight for grada level clianges fron .05 
(t=2.80, p<.Q5) to .06 Ct=^1.01, .N.S.) • The negative signs of 
the coefficients for products of grade level with Cohesiveness 
and Satisfaction (footnote to Table 4) indicate that estimates of 
correlations involving these two affective perceptions of class 
morale increase more slowly with grade level than do correlations 
involving other scales. Confirmations of these trends in additional 
empirical investigations, particularly in the elementary grades, 
would support the interpretation that organizational and task 
aspects of the social env'ronment strengthen and affective aspects 
weaken relative to one another with increasing grade level. 

Six interactions of unit of analysis and LEI scale are 
significant (Table 4 footnote). As in the case of grade level, 
they call for qualifications of the general relations of percep- 
tions and learning; these interactions reveal stronger associa- 
tions of some specific scales than others at certain l,;jvels of 
analysis. Since only one study used schools as units and another 
used sub-groups within classes, these interactions do not warrant 
much interpretation until further empirical studies are conducted. 
Cronbach (1976 and personal communications) and Walberg (1976) 
have discussed substantive and methodological issues of varying 
empirical relations across units of analysis. Future investiga- 
tions can contribute to the understanding of these complexities 
if parallel analyses are conducted using the student, sub-group, 
class, and school as units of analysis. 

Although the individual b-weights for the product terms 
should be interpreted cautiously, on the whole, inclusion of 
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statistically significant product terms improved the fit of the 
model. This is indicated in Table 4 by the increase in the 
multiple correlation from .57 for the reduced model to .71 for 
the product model. Illustrative coefficient estimates derived, 
from each of the two models are presented in Table 5, together 
with corresponding observed coefficients taken from two of the 
•original studies included in the research synthesis. These are 
observed and estimated coefficients for U.S. classes, computed 
for grades 4 (elementary) and 11 (high school) . The signs and 
magnitudes of the observi3d and estimated coefficients in Table 5 
are in good agreement, given the standard errors for the two 
models as reported in Table 4. It should be noted that the 
estimates for high school samples are clearly more accurate, 
and more empirical work on elementary samples is in order. As 
would be expected, the estimates derived using the product model 
are in somewhat closer agreement with the observed values than 
those derived from the reduced model. The difference in goodness 
of fit of the two models is not large, however, and for most 
purposes either set of estimates provides a reasonable summary 
of the data structure as well as expected sizes of correlations 
for future empirical investigations. 

Conclusions 

Across ten data sets from four countries and in a variety of 
samples, subject matters, and methodological approaches, perceptual 
aspects of the social-psychological environ..ient of learning are. 
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correlated consistently in sign in their relation to cognitive, 
affective, and behavioral learning outcomes with or without 
statistical controls for ability, pretests, or both. Specifically, 
these learning outcomes are postively associated with perceptions 
of Cohesiveness, Satisfaction, Task Difficulty, Formality', Goal 
Direction, Democracy, and Environment and negatively associated 
with perceptions of Friction, Cliqueness, Apathy, Disorganization 
and Favoritism. As a set, -these perceptions ^count for substan- 
tial variance in learning outcomes, beyond that accounted for by 
ability and pretest measures. 

The correlations differ significantly in magnitude across 
perceptual scales, units of analysis, nations, and grade levels, 
as well as combinations of scales and the other factors. Although 
these differences require further empirical investigation, the 
theoretical plausibility and incremental predictive validity of 
the scales, as wel-u as their ufx'^ity for further research and 
evaluation, seem warranted. Theiih causal relation to learning 
is plausible but unproven. Educators who doubt the causal 
relation, however, or who believe in the inherent-, value of learning 
environment properties as ends in themselves rather than as means 
to standard outcome measures may not need to fear sacrificing 
one for the other since they appear to go together. ^ 
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Table 1 



Statistics on 

r 


Twelve Studies 






Study 


Number of 


Number of 


Number o 




Correlations 


Classes 


Students 


Bardsley (1976) 


7 


30 


374* 


Fraser (1979) 


11 


153* 


541 


Moos and Moos (1978) 


10 


19* 


375 


Perkins (1976) 


5 


108 


3700 


Talmage and Walberg (1978) 


5 


59* 


1600 


Tisher and Power (1975) 


240 


20 


315* 


walberg and Anderson (1968) ^ 


22 


76 


2600* 


Walberg and Anderson (1972) 


150 


64* 


1600 ■ 


Walberg, Singh and Rasher (1977) 


60 


150* 


3000 



Subtotal 



Walberg (1969a) 
Walberg (1969b) 
Walberg (1972) 



Subtotal 
Total 



510 

84 
84 
56 



224 
734 



679 

144* 
144* 
144 



144 
82 3 



14,105 

3700 
3700 
3700* 

3700 
17,805 



Note: Since three studies (Walberg, 1969 a & b, 1972) analyzed a single data set 
to explore predictc±)ility across units of analysis and other methodological variations; 
the correlations from these studies were cranbined in the regression analyses to give 
each data set equal weight. 
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Table 1 continued 



*The units of analysis for each study are indicated with an asterisks. See text 
on the sub-group analysis in the Walberg, Singh, and Rasher (1977) study. See text 
on the use of school as unit of analysis in the Perkins (1976) study. 
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Design Characteristics .and Results of 'ivelve Studies 



Author (date) 



Qmronment Measures Outcome Measures 



Controls 



Results 



Bardsley 374 senior .A single question- Series of adjustment 
(1976) high students naire item which variables: misfea- 



Fraser 
(1978) 



126 



None Correlation of "Rules" item with 



subscales are weak: misfeasance 



in Australia. 
Student is 
unit of 
analysis 



corresponds to the 
Formality subscales 
of the Learning 
Bivironment Inven- 
tory (LEI). 



541 students in 

Modified, 55-item 

20 seventh- 
grade general ^^"t^^"" 
science classes, ing 9 scales with 
MelJDoume, 
Australia, 10 
experimental 
and 10 tradi- 
tional curriculum; 
153 sub-groups 
classified 
by sex, socio- 
economic status, 
and ability as 
within classes 'jnits 



internal consisten- 
cies from .50 to 
.80 



sance, self-estrange- 
ments, social power- 



lessness, value isola- 



tion, meaninglessness, 



social isolation, task 



powerlessness. 



= .01; self-estrangement = .14, 
social powerlessness = .24, 
value isolation = .00, meaning- 



fulness = .06, social isolation 



.04 task powerlessness =^-.07 



SeveiT cognitive and Pretests, In guided stepwise regression, 



affective outcome 
measures ranging 



student aptitude alone, produced Rs of 



attitudes, from .48 to .76; instruction 



in internal consistency and in- raised these by .16 to .26; and 



from .63 to .91. 



struction LEI raised R additionally from 
.17 to .47. Total R ranged from 



.81 bo .86. 
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Moos 


19 represen- 


90"'item, 9'-scale Class- 


Attendance and aver- 


None 


and 


tative classes 


room Environment Scale 


age grade given in 




Moos 


xn a U, S high 


With internal consis- 


class 




(1978) 


school; classes 
as units 


tencies from .67 to ,86 






Perkins 


About 3,700 


Mv School which is 


Five subscales of the 


Teacher 


(1976) 


elementary 


identical to My Class 


Iowa Test of Basic 


attitude 




school students 


contains 5 subscales: 


Skills: Vocabulary, 


variance 




in grade 4 


Cohesiveness, Compe- 


Reading, Language 


partialed 




from 42 U. S 


titiveness, Friction; 


Skills, Work Study 


out. 




schools; 


Difficulty and Satis- 


Skills, Math Skills; 






School was the 


faction, 


and average daily 






unit of 




attendance 






analysis 




■ 




Talmage 


About 1600 


My Class, an elemen- 


Science Research 


SRA alter-- 


and 


, elementary 


tary school version 


Associates Reading 


nate form 


Walberg 


school students 


of the LEI with 5 


Test Total Scores 


pretest 


(1978) 


in grades 1, 


scales as specified 




given one 


128 


2, 3, and 6; 

59 classes as 


above, ^reliabilities 
from ,54 to ,77, 




year earlier 



Absences correlated with Compe- 
tition and Teacher Control; grades 
correlated positively with Involve^ 
ment, Affiliation^ and Te^icher 
Support and negatively with Rule 
Clarity and Teacher Control. 
Positive relationship between 
performance on, Mv Class and 
student performance on achieve- 
ment tests when teachers' per- 
ceptiai of the school environment 
is removed. 
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Tisher 315 junior Modified version of 15 Achievement in content i7 selected attitudes: Pre-test posttest r of 

.33 raised to .48, satisfaction 

1) ecology 

consistencies from .53 measures, and satisfac- achievanent ' '^^''''''^ '^^ 



and high students -LEI scales with internal areas, attkudinal pretests: 

\ ■ ■ _ 

Power in grade 9 



(1975) from 20 classes .82\ 
•in Australia. 
Student was 
the unit of 
analysis 



Walberg 2,100 students 80-iten), 18 factor 



tion with teaching 2) population 
methods. ecology 

3) pollution 

4) population 

5) resources 

6) interest 

7) learning 
None 



raised to .47, .22 raised to .46 



and in 76 high 



analytically-de- 



Anderson school physics rived scales on the 
(1968) classes in Classrooni Climate 



9 cognitive, gffec- 
tive,and behayiro^l 
posttests regression- 
adjusted for corres- 



U. S; student as Questionnaire with ponding pretests 



unit 



internal consis- 



ranging in internal 



tencies ranging from consistency from .'^l 



.41 to .86. 



to .86. 



20 percent of the 162 intercorre- 
Icitions significant at .05; achieve- 
ment positively correlated with 
intimacy, negatively correlated 
with goal diversity and social 
heterogeneity; understanding 
negatively correlated with strati- 
fied; affect positively correlated 
with democratic, goal directioijj 
jpimalr-egaiitarian, and satis- 
faction and negatively with strati- 
fied, friction, disorganized, 
social and interest heterogeneity. 
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3,700 students Learning Environment 

in national Inventory: Fourteen 

sample of 144 scales with internal 

high school . / consistencies from 

i 

physics classes; .58 to .86 for indivi- 

class as unit duals and .43 to .84 
for classes 



Same as 
Walberg 
(1969a) , 
but student 
as unit 



Test on Understanding IQ, pretest On three cognitive criteria, 
Science, internal con- achievement median R with controls of 
sistences, .76; Welch and interest ,66 raised to ,72; on three 



non-'cognitive criteria, median 

R raised from .40 to ,51; on 

^ t 

achievement, .71 to .73. 



Science Process In- 
ventory, Physics Achieve- 
ment Test, .77; Academic 
Interest Measure, .91; , 
Pupil Activity Inven- 
tory, .80; Semantic 
Differential, ,86; all 
given at the begin- 
ning and end of a one- 
year course. 

Selected pretests and Corresponding R with controls raised: from 
posttests on understan- pretest .68 to .69 for understanding, 

.73 to .75 for achievement, 
.70 to .70 (not significant) 
for physics interest, and .75 
to .76 for voluntary physics 
activities 203 



di'^g, achievement, in- 
terest, and activities. 
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Walbeig About 1600 stu- 
and' dents, in 64 
Anderson Montreal high 
(1972) school classes 

in 8 subjects; 

class as unit 



I'he fifteen LEI scales. Standardized Quebec IQ 
Reliabilities from High School Learning 
.58 to .86. Examinations; inter- 

nal consistencies 
range from .70 to .80. 



Walberg, Somewhat less The fifteen LEI scales. 100-item multiple- IQ 



Singlir than 3,000 



'Reliabilities from .58 choice achievement 



and students in to .86. 
Rasher 300 studious £,nd 
(1977^ non-studious 

sulrgroups 

of 10 students 

each in 83 

science and 

67 social 

studies classes 

in Rajasthan, 

India High 

Secondary sclwols; 

sub-group as unit 
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tests geared to 
standard curriculum; 
general science in- 
ternal consistency, 
.67; social studies, 
.81. 



In split-sample double cross- 
validations, r with controls raised 
to R With LEI and croF;f;-valida- 
ted: .42 to .87 and .67 in sample 
A and .24 to .78 and .43 in 
second sample. 

In general science, r of .63 
raised to R. of .82; in social 
studies, .61 to .90. 
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Walberg Same as 
(1969) Walberg 
(1969a)| 



Same as above 



Same as above 



Corresponding R of regression-residual ized ^ 
pretest gain score with 14 LEI 

scales: .45 for cognitive 
criteria; .41 for non-cognitive 



criteria! .43 for achievement. 
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Descriptive Statistics for Correlations 
Between Educational Outcomes and Learning Environments 



Factor 



Mean 

Correlation 



Standard 
Deviation 



Frequency 



Location* 
USA 
Canada 
Australia 
India 



.10 
.26 
.06 
.32 



.15 
.37 
.09 
.49 



266 
150 
258 
60 



'Grade Level 

• Elementary 
Junior High 
High School 



12 
14 



.36 

.13 
.26 



10 
11 
713 



Sample Size* 
40-299 
300-499 
500-999 
1/000-3703 



.25 
.06 
.05 
. 18 



.40 
.12 
.08 

.27 



120 
257 
67 
290 



Uhit of Analysis* 
Students 
Subgroups 
Classes 
Schools 



.07 
,29 
17 

, 30 



.10 
.46 
.29 
.31 



325 
71 
333 
5 
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Type of Correlation 
Simple 
Part 

Partial 

Learning Environment Scale* 
Cohesiveness ' 
Friction^ 
Cliqueness^ 
Satisfaction 
Speed^ 

Task Difficulty 
Apathy^ ^" 
Favoritism^ 
Formality 
Goal Direction 
Democracy 
Di so rganizat ion ^ 
Diversity 
Environment 
.Competition 

CXatcome Domciin* 
Cognitive 
Attitudinal 
Behavioral 
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,15 \ ,29 442 

.11 11 

.12 .20 \^ 281 

^\ 

'x 

\^ 

.23 ^ .27 50 

.23 .23 53 

.12 .19 46 

.22 .21 54 

.02 .31 48 

.13 .24 50 

.14 .32 48 

.16 . 16 46 

.06 .26 57 

.17 .23^ 51 

.17 .24 50 

.13 .22 50 

.02 .73 * 47 

.18 ' .26 '49 

.06 .38 35 

• 

.17; .33 403 

.10 .11 284 

.07 .13 47 
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Table 3 continued 

Content Area* 



Gen er al Sci en ce 


.12 




1 J J 


Life Sciences 


.05 


.11 


165 


Physical Sciences^ 


.12 


.19 


279 


Mathematics 


.38 


.37 


15 


Social Sciences 


.34 


.50 


60 


Humanities 


.15 


..35 


35 


General Achievement 


.25 


.28 


40 


Miscellaneous 


.08 


.08 


7 



*Factors significant at the .001 level are indicated with an asterick. 



^Signs of correlations reversed for these LEI scales. 
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Variable 
USA 

Australia 
Grade Level 
Student as Unit 
Subgroup as Unit 
Classrooo as Unit 
Coheslveneas 
Friction* 
Cliqueness' 
Satisfaction 
Task Difficulty 
*pathy* 

Favoritisn* y 
PonaUty 
Goal Direction 

Oil 

ERIC 
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Table 4 

Conventional and Jackknlfed Regression. Statistics for the Two Model 



Reduced Model 



Cross-product Model 



Conventional 
b-weight t 



Jackknlfed 
b-welght t 



Conventional 
b-weight t 




-e.72 
-5.26 

U2 

-3.03 • 

-3.94 

-5.41 

9.76 

6.40 

2.18 

9.50 
10.12 

3.51 

2.86 

.20 

3.10 



Jaokknifed 
b-weight t 



-.25 

-,19 

06 
28 
42 
27 
85 
42 
08 
81 
03 
14 
12 
04 
20 



-2.33 
-1.54 

1.01 
-1.61 
-1.98 
-3.09 
16.38 
1.19 
.77 
9.10 
3.36 
1.62 
1.24 
.60 
2.32 
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Table 4 continued 



Social-Psychological Environments 



Vatiable 


b-welght 


t 


b-welght 


t 


b-welght 


t 


b-weight 


t 


Denocracy. 


.30 


5.93 


.32 


3.04 


.14 


2.79. 


.17 


2.18 


Disorganlzation'i 


.17 


3.73 


.16 


1.63 


.08 , 


1.89 


.06 


.74 


Envlronunt .. 


2.8 


6.14 


.31 


3.06 


.13 


2.16 


.12 


1.37 


Constant 


.11 




.42 


1.19 


-.20 




.05 


.18 


Standard Error 


.247 




.259 




.214 ' 




.225 




R 


.571 




.533 













'signs o( comWloM tot thiie KI icUu iivincd in lU mljm. 

»«.:■ U. i^m ^ t.,.l„. „ :,!., 3.25, .M M, M M Wi, «,p.ctl»l,. 

ffl.d.nt mit ,t 1, cb.,l«.„ ..Mm, T..» wniclt, -,9.|2.»|, s«b„,« „ ut 0, talyl. 1, MHtalt, ..JO(J.J«|, 

-.«uu6,, ,^n^ .m.m. ^ Mr.cu» .1,1..,,, .i„:.«,, ,„,„„ „.„ ., „ ^ ^^^^^ 

'-.33(5.95), Friction .41(1.75) and Task Difficulty -.94(3.01). 
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Table 5 



Correlations of Learning Environment with Achievement at Two Grade Levels for U, S. Classes 



Elementary 



Reduced Product 
Estimate Estimate 



High School 



Observed Reduced 
Estimate 



Product Observed Observed 
Estimate History Physics 



LEI Scale 



Cohesiveness 


,27 


.17 


.00 


.63 


.38 


.81 


.93 


Friction^ 


-.29 


-.52 


-.37 


-.65 . 


-.80 


-.90 


-.75 


Cliqueness^ 








-.45 


-.27 


-.74 


.04 


Satisfaction • 


.30 


.38 


.11 


, .66 


.45 


.63 


• .63 


Task Difficulty 


.19 


-.14 


-.27 


.i:5 


.28 


..26 


■ .53 


Apathy^ 








-.39, • 


-.33 


-.86 


-.79 


Favoritism^ 


/ 






-.50 


r-31 


-.59 


\^ 


Formality 








.39 


.23 


-.12 




Goal Direction 








.62 


.19 


.66' 


.44 


Deinocracy 








■ / 
' ^59 


,36 


.61 


.50 


Disorganization^ 








-.44 


-.25 


-.71 


-.42 


Environment 








.59 


.55 


.80 


.88 


All others 


-.08 


-.23 


-.51 


.28 


.19 


-.15' 


-.57 
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G 46 Social-Psyclioldgical Eiivironmonts 

Tablo 5 continued 



^Signs of correlations for these LEI scales were r^^versed in ^11 analyses. 'Itie original signs, howovor, have 
been restored in this table. Observed correlations were obtained from Talmago and Walberg'(l978) and Walberg 
and Anderson (1972). 
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i A Methodology for Research Synthesis in Science Education 

Over the past 50 years, science educators have periodically reviewed 
and organized the research literature on science learning. Most of the re- 
views were designed as comprehensive suiranarizations of the literature over 
a specified time period. Mdllinson (1977) described these past efforts 
starting with A Digest of Investigations in the Teaching of Science in the 
Elementary and Secondary Schools (Curtis > 1926) through Mallinson's (1977) 
A Summary ofr Research in Science Education — 1975 , Though valuable as com-, 
prehensive summaries, the past reviews are difficult to compare due to the 
absence of a common model or set 'of constructs defining the major categories ' 
of variables influencing science learning. This absence has meant that gaps 
in the research, often go unnoted, since each reviewer develops a unique or- 
ganization of material based on, the trends and priorities of the period and 
the reviewer's point of view. Another limitation of these qualitative re- 
views is loss of the quantitative aspects of the accumulated studies. Ad- 
vocating the quantitative synthesis of research. Light (1971) commented, 
"Little headway can be gained by pooling the words in the conclusions of a 
set of studies" (p. 443), 

The purpose of this paper is to present a model to guide reviews of re- 
search on science learning, a methodology for the quamtitative synthesis of 
studies, and a summary of the results of cui application of the model and 
methodology. 



Don 



H2 

A Model to Guide Reviews 

A model to guide reviews of research on science learning should con- 
sist of a manageable number of constructs^ reasonably comprehensive in ex- 
plaining the observed variance in science learning. The set should cor- 
respond closely to past and present categories of research on factors af- 
fecting learning while allowing for the subsumption of new variables under 
the constructs. The set should provide for the inclusion of variables of 
immediate influence on learning, e. g. / teacher reinforcement of student 
behavior, as well as variables representing important but less direct influ- 
ence, e. g. , parent education. It would be unrealistic, however, to expect 
the sat to" account for all predictable variance, given the multiplicity and 
complexity of factors that affect learning. 

Ideally, a widely accepted theoretical model would provide the frame- 
work for ^empirical literature reviews. But such a model is not available, 
nor does there appear to be serious study of this problem within contempor- 
ary science education research (Peterson & Carlson, 1979, p. 506). In the 
absence of a theoretical model, a set of constructs might be identified through 
examination' of general education empirical: research. Variables found to be 
substantially related to learning could be organized into a manageable num- 
ber of major construct categories. 

Bloom^B (1976) search and analysis of the research literature led to 
three major immediate factors influencing learning within the classroom: 
student affective entry characteristics, student cognitive entry behaviors, 
and the quality of instruction. This set of factors appears sufficiently 
comprehensive but does not mcike explicit how these constructs correspond 
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to past ai>d present categories- of research nor how they incorporate put-^-of " 

classroom factors. - 

• Walberg (1979) developed a larger though still manageable set of con- 
structs which make explicit the subcategories implied in Bloom's set. Walberg 's 
oight constructs are: student ability, motivati9n, and age oir^d^velopmental 
level; the quality and quantity of instruction; and the home, peer, and class- 
room psychological environments. This list may be further revised and re- 
fined, but it provides for the major interrelated factors which the empirical 
literature would support as significantT correlates of learnihg (Bloom, 1976, 
1900; Comber & Keeves , 1973; Rosenshine, ^1979). The list has the, advantage 
of a close relationship with the major schools of empirical educational in;^ 
quiry over the past three- quarters century, allowing the qucintitative syn- "x^ 
thesis of many past studies to form estimates of the degree of association 
with or influence on learning by each construct. 

Though certainly not the only possible choice, Walberg's eight con- 
structs were adopted as the framework for this research synthesis with the 
hope that the results would support their routine use. That is, it was 
hypothesized that constructs important to learning in general would be im- 
portant in science learning as well. If so, these constructs might routine- 
ly form the core of bivariate and multivariate studies in the future. 

A Methodology for Quantitative Synthesis 

Several different approaches to the quantitative synthesis of research 
liave been proposed in recent years. Light (1971) recommended a cluster ap- » 
proach wherein the data from studies of similar high quality and instrumen- 
tation are combined. Gage (1978) used a technique of converting the p-value 
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(significance level) of comparisons of similar treatments on tl^e same depen- 
dent variable into a, form of the chi-square statistic allowing studies to be 
combined. Glass (1978) described a method for combining reported correlations 
or calculated effect sizes across related studies. 

E^ch approach to quantitative synthesis has strengths and weaknesses. 
Light's ideal of using original data from closely comparable, high quality 
studies greatly limits the number of useful studies while increasing the 
time and effort requirements. Gage's use of p-values and Glass* use of cor- 
relations and effect sizes violates some assumptions of sampling and statis- 
tical comparability but provides-estimates of effects and directions at a 
level of ^iccuracy prpbaHly appropriate to the general quality of the data in 
th^eN^iginal studies, and at an effort level which makes the synthesis practical, 
Rosenthalxn.978) has presented several variations on the above three approaches. 

Regardless^f the effort required and the precision of the methodology, 
the advantage of qu^titative synthesis is the possibility of increased objec- 
tivity, precisionr and^x^nciseness in reporting quantitative outcomes and 
trends compared to a purelyNi\aa3^itati^ treatment. Objectivity is gained 
through-.the use of coding schemes^ which allow different raters to arrive at 




reliably similar characterizations oj\sevaral features of a given study. 



Quantification also improves precision sJMice features of a study (study-variables) 
can be coded at several gi^dations providing^Na finer discrimination among, and 
comparability across studies, tli^n qualititativeXstatements would allow. Fin- 
ally, quantification gives a conciseness, to the integ;ration of a set of studie > 
yielding, where a sufficient number of studies. is available, a regression 

\ 

equation predicting study outcanes under multiple- study conations. 
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Quantitative synthesis should be vciwed, however, as supplement inq 
rather tlaan replacing, qualitative review. The quantification of any var- 
iable ultimately rests on qualitative descriptions of what the numbers represent 
and how the measurements were conducted. Research, as well as research syn- 
thesis, begins with the qualitative and moves toward quantification as greater 
objectivity, precision, and conciseness is sought. 

Glass V approach to quantitative synthesis was chosen for this study since 
the information needed for coding a study can be extracted from the published 
study report; and correlations and effect sizes provide a measure of the strength 
of a relationship, not simply whether or not it is statistically significant. 
In fact, a strong argument can be made that effect size should replace alpha 
level as , the most important outcome in experimental .studies (Cohen & Hyman, 1980). 

A Model Guided Quantitative Synthesis 

•Of * 

A quantitative synthesis of research in science education was conducted, 
guided by Walberg's eight constructs and using Glass' s methodology. The re- 
mainder of this paper describes the adaptation of the methodology to this re- 
search synthesis. 

The purpose of the synthesis was to develop sound approximations of the 
magnitude .of the relationship between each construct and grade 6 through 12 
student learning in science. Literature selection was restricted to the 1963 
through 1978 period, a time of major curriculum reform and increase in the 
quality and quantity of research. The grade 6 through 12 levels were chosen 
to include the usual range of science course offerings in the precollege cur- , 
riculum, beginning with required science in the junior high school ^nd term- 
inating with elective science in the senior high school. This age group is 
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also characterized by the transition for many students from copier ete to for- 
mal operational thinking (Inhelder and Piaget, 1958), an important research 
topic of the period. 

■s 

CodiTig 

The quantitative synthesis of research requires the development of a v 
coding scheme. The coding scheme should summarize in numerical form the 
characteristics of the subjects, the setting, the independent and dependent 
measures, the research design and threats to validity, and the reported 
strength and direction of the relationship between the variables under study. . 

In this synthesis, the coding schemes for the eight constructs were 
identical with the exception that each construct had a special section for 
coding the independent variable. Aside from source information, the typical 
number of characteristics coded was approximately 40. A 90 percent agreement 
was reaclily attained between coders on a sampling of independently read 
studies. 

To provide a sense of the degree of discrimination and detail in the cod- 
ing scheme, three sections are briefly discussed here. 

1) Dependent measure. Eight categories of dependent measures were chosen 
for coding based on a sampling of the research literature and a desire 
to be comprehensive. The categories were cognitive achievement, factual 
learning, conceptual learning, process Ixjarning, logical operations, 
creative or critical thinking, attitudes and interests^ and lab performanc 
(Each kind of measure is operationally defined in the study code book.) 
Most often, the label given a measure by the author of the study was ac- 
cepted as the proper classification, even though it is known that the 
many kinds of cognitive measures have similar items and a large amount 
of shared variemce. Xiater analysis always required some combining of 
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these categories due to an insuif icient number of studies with measures 
in any one category. 

Pour other characteristics of the dependent measure were coded": 
the type of measure (general, science, specific discipline, specific 
course); whether a locally constructed or a published instrument; the 
reliability; and a judgment about the validity (adequate or inadequate). 
Study design characteristics and threats to validity. Glass (1978) 
proposed the coding of individual study design and analysis features 
which might have influenced study results. Once coded, the covariance 
of those study-variables with study findings could be examined, making 
full use of statistical methods. In the present synthesis, various as- 
'pects of each study's research design were coded. These design factors 
included the threats to experimental validity identified by Cook and 
Campbell (1976), and are summarized in Table 1. 

Quantitative relationships. The value, sign and level of significance 
of each reported correlation was recorded. Where an experiment or 
quasi- experiment was involved, the effect size and direction and the 
level of significance of the statistical test were recorded. 

"Nearly all effect sizes were computed using one of the following 
two formulas from Glass: \ 



\ r.r. Xe - Xc 

> ES = — ES = t 

Sc I 



_ _ \ 

Xe and Xc represent experimental and control group means respectively. 
Sc is the standard deviation of the control groui?. t is the computed 
t-test statistic. If an F-test were used in a on4-way analysis of var- 
iance to compeure two groups, the F value was considered equal to t^. 

\ 

\ 
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If. only the total sample size was given it was assumed th^t n^, 
since equal n's provide a more conservative estimate of effect size than 
the unequal n's. Finally/ in cases where one-way analysis of variance 
was used, homogeneity of variances was assumed setting Sc = MSw. Two- 
way analysis of variance tables, however, without other' statistics, were 
insufficient for computing effect sizes. 

Code sheets were devised for recording all above and additional infor- 
mation about each study. Several code sheets were often necessary for a single 
study depending on the number of effect sizes computed, correlations reported, 
and whether data were reported separately for different study-variables such 
as grade or ability level. 
Analysis 

dice coding is completed, the analysis of the coded data can take a 
variety of forms depending on the quantity, of data and the researchers in-^^^___ 
clination toward the liberal application of statis^tlc^l^^^ 
step is. to decide how to deal with the prob^iem of the non- independence of^ 
multiple correlations^r^ef^fe^t sizes extracted from the same^^tudy. One 
^olu-tion'^xs^eighting each correlation or effect si^je--d:nversely to the num- 
ber of each extracted from a given stiidyr^ A study contributing 10 .correla- 
tions receives the same-'tofal weight as a study contributing three. A second 
solution, the one used in this study, is to select only the median value 
correlation or eff^t size from each study. This procedure will greatly re:^^ 
duce the data set, but will equalize the contribution of each-^tudy. 

Many more questions can be raised ^abiout^'tlie appropriate weighting of 
studies. For example^_,shoTll3a study with an n of 30 receive the same weight 
as^jjne-wttn an n of 2000? Should a high quality true experiment receive the 
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same weight as a low quality quasi-experiment? Answers to questions like 
these might be framed as hypotheses prior to analysis; an empirical answer 
Is available in the analyses itself. Correlations from large sample studies 
can be compared to correlations from , smaller sample studies; true experi- 
ment studies can be compared to quasi-experiment studies, ^arid so on for any 

study-variable the researcher wishes to code, ^ 

% y • y 

Once the weighting problem has been resolved (whether to use weighted 
or median correlations and effect siz^a}"r^analysis involves grouping code^^ 
sheets of studies with the same-Independent and dependent variables^.^iid^ 
associated statistic i^rT^corr elation or effect size; treatii)g^^e s tat is- 
tic as a dependent variable and relating the values of^^^s dependent variable 
to different study-variable conditions, Questiorjis^l'lJce the following can be 
addressed: What is the average correlation oj?^ffect size across studies? 
Does the reported correlation (dependen^-^'variable) vary in a systematic way 
with sample size, outcome measure.,''1reliability , or the ability level of the 
subject sample? Qr^is the^m^axT^orrelation fairly constant across variations 
in study- variable s.,.-'^^ince several independent study-variables and a single 
dependent^j^^i^ole^ (correlation of effect size) have been quantified, t-tests, 
F^es^s, correlations and regressions can be conducted to characterize the 
relationship among the variables. Table 2 summarizes the literature selec- 
tion and central tendencies of the correlations or effect sizes in this study. 
References to separate, more detailed reports of each synthesis are given in 
the Table, 

The extent of the analysis conducted on the data in each construct' was 
a function of the number of data points available. The 734 correlations and 
related set of study-variables in the classroom environment construct allowed 
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extensive regression procedures wherein the contribution of eac^h study-variable 
to variation in the set of correlations could be estimated. Analysis of the 
•impact of study-variables on the 67 ability construct correlations tooi^.^^' 
form of a aeries of t-tests wherein each study-variable was dicotomized Into 
high and low or two nominal categories, each containing approximately the same 
number of median correlations.- 

A study-variable of particular interest in the quality of instruction 
construct studies was design quality. A design quality index representing 
a summation of positive' design features (features minimizing threats to valid- 
ity, i. e-, Table 1) was significantly (p.= .09) corelated (r = .21) with 
effect size. That is, the better the study design, the greater the difference 
between experimental and control group means as measured by effect size. 
Better study design meant a greater effect favorable to the experimental group. 

A study-variable which systematically influenced the ability and cogni- 
tive achievement correlation was the reliability of the two instruments. 
For example, the reliability of the outcome instrument was correlated withjthe 
correlation between ability and cognitive achievement at,^an-r' value of .33. 

Summary 

The purpose of this paper was to provide an overview of a model and method- 
ology for quantitative research synthesis in science education and a summary 
of their application. 

The model included a comprehensive set of constructs to guide the litera- 
ture roview and to help identify important groups of variables not receiving 
sufficient research attention. The quantitative methodology added objectivity, 
precision, and cohciseness to the traditional quantitative review. Quanti- 
fication allowed the reporting of mean correlations and effect sizes and 
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examples of how these statistics vary with study-variables such ^s design 
quality and instrument reliability. Detailed reports were identified which 
the interested researcher might consult for an in-depth treatment of the 
quantitative integration of research summarized here. 
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Code Categories for 
Study Design Characteristics 



Sample selection 

1 = simple or stratified random 

2 = purposive sample, e. g,, extreme 

or specialized group 

3 = matching 

,4 = convenience or ill- specified sample 

Unit of analysis 

1 = individual 

2 = group 

Study design (Campbell & Stanley) 

1 = correlational 

2 = quasi- experimental 

3 = true experimental (random assign- 

ment) 

Reliability of treatment implementation 

1 - low; treatment and implementation 

poorly described and documented 

2 = adequate; treatment and implemen- 

tation clearly described and docu- 
mented 

3 = high; treatment described or docu- 

mented with observational checks 
on implementation 

Statistical power 
1 = inadequate 

2,= adequate, i. e., 6 or more classes; 
100 or more individuals total in 
2 comparison groups or in correla- 
tional group 

Error rate (Given the number of com- 
parisons or correlations, is the overall 
p level sufficientJy low to assure a less 
than .05 chance occi.rirence of this par- 
ticular relationship?) 

1 =» inn'lf equate p level 

2 = adequate, i. e,, p less than .05 

Maturation (Have factors within units 
rather than the treatment brought 
about thu difference observed?) 

1 = probable threat 

2 = adequately minimized 



^ History (Have external factors in 
the environment rather than the 
treatment brought about the differences 
observed?) 

1 = probable threat 

2 = adequately minimized 

Selection Bias (Do pre-oxistinq dif- 
ferences among the groups account 
for later observed differences?) 

1 = probable threat 

2 = adequately minimzed 

Contamination, Compensation, Dif- 
ferential Incentive (Do untreated 
control groups work harder, work 
less, or somehow gain benefits or 
lose incentive due to influences " 
from treated groups or teachers?) 

1 = probable threat 

2 = adequately minimized 

Mortality (Do different dropout 

rates account for observed differences?) 

l' = probable threat 

2 = adequately minimized 

Generalizability (Can results be 
generalized to other times, units, 
or settings with similar demographic 
characteristics?) 

1 = probable threat 

2 = adequately minimized 
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Table 2 

Suimiary of Sources and Findings on the Relationship of 
Each Construct to Cognitive Learning in Science 



Measures 



Detailed 
Report 



Number of 
Studies 



Sources 
Searched 



Data Points: 
Number & Kind 



Mean Corr. or 
Effect Size 



Comment 



lentdl 



ion 



Mron- 



/irpn- 



Chronologlcal aqe 
(within grade) 

Plagetian stage or 
logical operations 
(within grade) 

IQ, gen. aptitude, 
prior achieve., 
quant. - spaclal 

Self-concept, need- 
achlev. , persistence _ 
test anxiety 

Parent SES> science 
equip, and documents 
In home; parent 
Involvement, educa- 
tion 

Rapport with peers, 
peer partic. In science 



Boulanger & 3 JRST, SE, 3 median r » - .01 Age, is a sign, pos, predictor pnly 
Kremer, 1980 DA, ERIC corr. if multl -grade level data, 

" 6 " 13 median 

corr. r « ,40 This correlation peaked at .69 

in grade 9, 

Boulanger 34 JRST, SE, 62 median r ■ ,48 ^^Jhis^onstTucrgavTthe highest and 

(I980a) SSM, ERIC corr. most reliable mean corr. 



Kremer & --.-.'-5'^"^JRST, SE, 5 median r ■ .37 Higher corr, were obtained with 

-Halberg'; SSM, DA, corr. standardized over locally made 

1980 ERIC, SSCI scales. 

13 " 6 median r-.24 Parent education and aspiration 

corr. and Involvement with child's 

science best predictors (r ■ .36) 



2 median r « .25 Studies too few and too diverse to 

corr, note trend In best predictors. 



of 

tlon 



/ of 
tion 



Use of adv., organizers, 
beh., obj., concrete 
materials, higher struc- 
ture. Indirect and Induc- 
tive strategies, training 
In thinking 

Class periods spent on 
teaching the content 



Boulanger 
(1980b) 



51 



JRST, SE 
SSM, ERIC 
(pub. only) 



JRST, SE, 
SSM, ERIC 



52 median 
effect sizes 



3 com 



ES ■ .55 ; Significant effects due to: 
equivalent behav.obj., concrete materials, 
r ■ ,25 - .30 higher structure, training 
in thinking. 



r ■ .19 None of three studies gave sign, 

corr. 



om Student perception of 
[iient several classro.'xn 
Social variables 



Haertel, Ualb^rg 12 
& Haertel , 1979 



ERIC, DA, 7 mean corr. F- .19 Sign, predictors were: cohesiveness. 

EI, DA, SSCI based on low friction, satisfaction, low 

353 raw corr. favoritism, goal dlrectlont democracy, 

and material environment. 



lym code: JRST, Journal of Research in Science Teach ing; SE, Science Education ; 

School Scienc e and Mathem atics; DA, Dis sertation Abs tracts; ERIC, Educa t ional 
jrce Information Center (Science) bibflbgraphies and computer search; SSCI, 
il Science Citation Index ; EI, Education Index . Pub, only means only published studies 

Included. 
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Toward A Synthesis of Research Findings 
on Sex Differences in Science Learning 

That far fewer women than men pursue careers in mathematics and science 
and reputedly demonstrate far lower scores on tests of aptitude and achieve- 
ment in these areas has, until recently, been accepted as a natural conse- 
quence of innate sex differences in aptitudes for those fields. Thorndike (1973) 
noted that in none of the countries he surveyed (including the U. S,A) , did 
girls' do as well as boys in science. Differences were observed on the order 
of half a standard deviation. 

Investigations as recent as those of Stafford (1972), and Page (1976) 
have suggested sex-linked, hereditary hypotheses in explaining differences 
in male-female achievement, particularly in math. Reviewers of sex difference 
research have, however, indicated the inconclusiveness of findings favoring 
biological explanations of sex-related cognitive differences (Maccoby & Jacklin, 
1974; Fennema, 1977; Sherman, 1977; Vandenberg & Kuse, 1979). An alternative 
hypothesis rivalling alleged biological influences, stresses the importance of 
sociocultural factors on male- female performance. 

Some have concluded that social values and behaviors more often shown by 
girls tend to be those not associated with the successful development of intel- 
lectual achievement (Maccoby & Jacklin, 1974; Fox, 1977). Nash (1979) discusses 
evidence that individuals' gender- identity influences their motivation to success- 
fully perform on cognitive tasks that have been sex-typed as male or female. 
Recent work by Kremer and Walberg (1979) strongly suggests the importance of 
social psychological factors such as home environment, and student motivation in 
science learning and achievement. Fennema and Sherman (1977) have demonstrated 
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that^much of the difference in mathematics achievement between males and 

! 

females is attributable to course- taking history. These researchers also 
found that mathematics is viewed as a^.male domain by many girls, thus re- 
suiting in their avoidance of math tshermari & Fennema 1977) , 

While considerable attention has recently been^ devoted to sex dif- 
ferences in mathematics achievement, relatively little attention has been 
given to this topic in science. Concerning sex differences and women's 
achievement in science, the questions are still largely those of "if, when, 
and where" sex differences exist in performance in this broad intellectual 
area. While the percentage of doctorates in the sciences currently being 
awarded to women is increasing and has once again reached tl:)e level of the 
1920* s , there have been significant shifts in this trend over intervening 
decades, and women continue to be under-represented (Vetter, 1978), Tlie 
uniqueness of the perspective women bring to the sciences, and their poten- 
tial, high productivity in science has been noted CAstin, 1978). Much stands 
to be lost, in the face of lagging industrial and technological productivity 
if this potential is not developed. 

Women's participation in science is dependent upon the quality 
and the effectiveness of the science education they receive. More needs 
to be known about women's achievement in science, and the social, psycho- 
logical, and even biological factors that influence it. If one assumes that 
important cognitive differences exist between the sexes, then understanding 
the nature and extent of these differences is important for determining 
what type of intervention, if any, would be most effective. An objective 
base for examining issue* related to sex differences in science achievement 
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is needed if recommendations for educational policy are to be made, and 
promising directions for research in this important area are to be iden- 
tified • 

Toward an Objective Base 

A review of related literature has revealed the following issues that 
need be addressed in the development of an objective base for identifying 
research priorities in sex differences and science learning, and drawing 
implications for educational policy, these are: male-female achievement 
in different domains of science learning; male-female achievement in science 
when cognitive, instructional, and attitudinal factors are controlled; age- 
related trends in male-female science achievement; and the variance of ob- 
served differences in male-female achievement with the chronological time 
of investigation. 
Domains of Science Learning 

In what domains of science learning (for example, factual learning, 
scientific processes, attitudes to science) do male-female differences 
occur? Research in math education has shown differential achievement for 
males and females in some areas of mathematics learning, but not in others. 
Could the same be true for science? If so, can differential achievement 
in science learning best be understood in term's of biological and genetic 
hypotheses, or by social psychological explanations? 
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Cognitive , Instructional , & Attitudinal Constrols 



If cognitive, instructional, and/or attitudinal factors are controlled , 
are observed sex-related differences in science learning accounted for? 
Fennema and Sherman (1977) have shown, for example, that studies of par- 
ticular abilities such as mathematics will be biased if cour se-takina is not 
similar for males and females in the 3tudy samples. Moreover, studie^ 
based on large random samples of secondary school students, may be coi||- 
paring a more heterogenous group of females with a more homogeneous, ijntel- 
lectually motivated group of males if males are more likely than femal'es to 
drop out of school. 
Age-related Trends 

Are there significant, age-related trends in sex differences obseifved 

in science learning? Teman and Tyler (1954) report evidence for increasing 

1 

sex differentiation with age in the areas of abilities, interests, preferences, 

and responses to personality inventories. Petersen and Wittig (1979) Have 

i 

suggested that observed differences between the sexes increase with age as 
socialization effects accumulate, and that puberty, is likely to be a critical 

I 

time -for the intensification of socialization effects. Conclusions about the 
existence of sex differences, then, may well depend upon the age mix iri the 
best of studies in which sex differences are examined. 

Chronological Time of Investigation ! 

i 

Does the frequency and m -.gnitude of reported sex-related differences in 
science learning appear to vary with the chronological time of investigation? 
Are reported sex differences in performance in science of greater magnitude 

in older, or more recent studies? j 

i 
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Sex Differences and Science Learning 



Maccoby and Jacklin (1974) noted: 



"As sex role behaviors have been less 



rigidly defined and enforced, sex-related differences have decreased and 



in many instances they have not been demonstrated. 



It 



Oie might therefore 



expect to see sex-related differences of a much smaller magnitude or les- 



ser frequency in more recent, as opposed to older studies. 



A Methodology 



Probably the best known work on the psychology of sex differences is 



that of Maccoby and Jacklin. Among the most noteworthy contribu- 
tions of this extensive work is their systematic, and analytic synthesis 
of research. Often, what is considered "truth" is shown upon closer anal- 
ysis to be based on inadequate reporting, or the failure of researchers to 
control for significant variables. Where popular beliefs are supported, 
new complexities are often revealed. A synthesis of research findings on 
sex differences in science learning employing quantitative meta-analysis 
techniques. (Glass, 19781, 
Rat ionale 

Numerous studies currently exist which report the results of male-female 
comparisons on measxires of science learning and achievement. A recent litera- 
ture search revealed over 150 studies of science learning reporting cross-sex 
comparisons. An integration of research reporting gender differences on measures 
of science learning is needed at this .time for a number of reasons. 

The purpose of research synthesis is to determine what existing research 
proves about the relationship of one variable, or class of variables to another; 
in this case, sex differences in science learning. While extensive reveiws of 
sex differences in cognition have been conducted (Wittig & Petersen, 1979); 
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it is often difficult to relate basic psychological research to educational 
policy, even studies of cognitive processes basic to science learning. While 
national siirveys of science achievement based on random samples have been con- 
ducted, and report results by gender, the policy implications to be drawn from 
these studies are som^etimes limited, They are limited by the lack of control 
of important variables mediating science achievement (e,g,, previous instruction 
and attitude), and the exclusive focus on one Cor possibly two) outcomes. Even 
the application of secondary analysis techniques to these data bases cannot go 
beyond such limitation!^. What is needed therefore, is a systematic integration 
of science education research incorporating findings across numerous studies re- 
presenting diverse samples, with outcome measures reflecting several domains of 
science learning. 

Extensive syntheses of science education research have been conducted 
(Walberg, Boulanger, Kremer, & Haertel, 1980). However, much of the research 
reviewed under the social, instructional, and ability constructs defined in 
the guiding model, does not routinely report male-female compari?jns. Further- 
m^j^, numerous investigations reporting sex differences were not included in 
these ri3views as they did not fall within the boundaries established for the 
selection of literature. A synthesis of findings on sex differences in science 
education research is therefore needed to address the concerns of science educa- 
tors and policy-makers in this important cirea. 
Methods of Research Synthesis 

Syntheses of empirical research generally employ one, or a combination 
of methods: narrative reviews of the literature, box scores or tallies of 
significant findings (Light and Smith, 1971), and quantitative, statistical 
techniques as exemplified by meta-analysis (Glass, 1978) and the joint proba- 
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bility method (Rosenthal, , 1978) . The use of box scores for integrating re- 
search findings typically involves determining for each study, whether or not 
a statistically significant difference was found; and if so, its direction " 
(i.e. , whether . the treatment or control group was favored) . Studies are^^ien^- 
tallied according- to whether significant differences are reportedTT^and the 
direction of significant findings is noted*- In the ffeld of sex differences 
research, the work of Maccdby and Jacklin best exemplifies the use of box 
scores^ f or research integration. 

The joint probability method of research integration involves combining, 
or pooling the exact, one-tailed probabilities of each comparison rdportedi 
Methods for combining probabilities are discussed, and illustrated by Jones ^ 
and Fiske (1953) > and by Rosenthal (1978). A still more powerful quantitative 
method however, is the meta-analysis technique developed by Glass (1978). 

Meta-analysis is based upon the derivation of an effect-size representing 
a normalized measure of the difference between two comparison groups on a 
measured outcome. The effect size expresses the magnitude of groups differences 
on a cionmon scale, so that findings from studies employing different measures, 
^and methods, are rendered comparable. In contrast to methods employing box 
scores, and combined probabilities? the effect size calculated in a meta- 
analysis -has 'the advantage of providing an estimate of the over-all size of an 
effect, in addition to its significance. " , ' ~ ' ' 
A Case"for the Meta-Analysis of Sex' Differences 

Quantitative methods for synthesizing research, entail more objective 
process^ for summarizing individual studies, and allow for more concise means 
of displaying and interpreting results than more conventional reviews. Meta- 
analysis in particular, allows for the application of a bro^d^-rang^e^of analysis 
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techniques, from frequency distributions to multivariate methods. Moreover, 
meta-analysis permits the integration of primary studies representing diverse 

samples and outcome cf'ategories. This is a distinct advantage in the formulation 
of well-rounded policy statements, and the definition of research priorities. 

The meta-analysis of research represents. a particularly appropriate tech- 
nique for synthesizing studies on sex differences,, since it results in a statis- 
tical statement about the magnitude of differences between groups. This is es- 
pecially important since previous syntheses of research on sex differences have 
been criticized for overemphasis on null hypotheses , and failure to note the mag- 
nitude of differences (Block, 1975J . Jacklih (1979) has recently called upon re- 
searchers to go beyond the necessary first step of sorting findings by statis- 
tical significance (as ^eiriplified by the work of Maccoby & Jacklin, 1974), and 
employ techniques estimating the size of observed differences. 

Conclusion 

t. ■ ^ — 

A quantitative synthesis of research findings on sex differences delineating 
both. the frequency, and magnitude of observed differences in science learning is 
warranted. It is warranted on the basis that important issues concerning the par- 
ticipation, and education of women in the sciences have yet to be resolved, and 
appropriate methodologies for the synthesis of research have been currently de- 
veloped. ^Of_coj»eerTraxe questions regarding male-female achievement in different 
domains of science learning, particularly when critical instructional and attitudi- 
nal factors are controlled, age-related trends in male-female science achievement, 
and the observed variance of male-f emaie-achievement with the chronological time 
of the investigation, "The sy'stematic integration of existing research constitutes 
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necessary first step in the formulation of meaningful hypotheses for further 
tudy, and directing the concerns of policy-makers. 
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Abstract 

To assess the im; . ct of the innovative pre-college science curricula of the 
past '20 years on achievement, a search was conducted using the computer-assisted 
Bibliographic Retrieval System (BRS) , the ERIC Annual Summaries of Research in Science 
Education and Dissertation Abstracts International. A total of 197 comparison- 
effect sizes were obtained from 33 studies representing 19,149 junior and senior 

I 

high school students in the Unitj'ed States, Great Britain, and Israel. Study- 
weighted analysis" yielded an overall mean ^fect size of .308 significantly 
favorable to the innovative curricula {t(25) = 2.183, p<.05). 
Student performance in innovativj^e curricula averages at the 62nd percentile 
relative to the control norm, xkbulation of signed comparisons indicated 
that 64 out of 81 unweighted outcomes were favorable to the innovative curricula. 
Separate analyses for test content bias, methodological rigor, type of learning, 
and student characteristics showed no significant differences across these 
categories. 
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Bccjinninrr in tlic into l')50*ii} aiui (..'onLi mn .k • Lti l.lir ^ (.mi; l.iv L'>'> V^.(.M-i-- 
can taxpayer has supported scientists aiid educators iii pre-collcyo curriculum 
development. Creation of innovative courses in science for grades 7 tnrough 12 
received special attention in the 1959-1973 period, accounting for approximately 
two-thirds of the National Science Foundation's (NSF) curriculum development ex- 
penditures for that period (NSF, 1975). NSF allocated $92 million to development 
and implementation of 19 such projects. Associated with the effort were exten- 
sive teacher-training summer and year-long institutes in the usp of thr» new pro- 
grams and numerous evaluation studies of the classroom impact of the innovative 
programs, usually in comparison to "traditional" curricula. 

A debate about the effectiveness 'of the. new programs began with the first 
.implementations in the early I960' s especially the "new math". By the mid-70's 
sufficient studies had accumulated that a summative judgement appeared possible. .. 
Walker and Schaffarzick (1974) conducted a partial search of the literature and 
located 26 studies which compared students exposed . to different curricula in the 
same subject on some measure of' school achievement. Using statistical significance 
as the criterion for counting a comparison, they reported a general trend support- 
ing the hypothesis that a treatment, whether innovative or traditional, yields the 
higher score on tests biased in its favor. On tests biased toward the innovative 
program, the innovative group performed better on 44 out of 45 significant compari- 
sons. On tests biased toward the traditional treatment, the traditional group 
performed better on 9 out of 14 comparisons^ Wliere test bias was neutral or could 
not be determined, both treatments were equally effective. Unfortunately, out 
of 98 comparisons, 32 were not considered in the count since significant dif- 
ferences were not found by the original investigators. 
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i:allcer and Schaf f ary.ick acknowledged both the limited extent of their search 
and the inadequacy of statistical siqnificanco a.s a criterion for countinq compari- 
sons. Objective quantitative tochuiqucs for yyutheijizimj research, moreover , that 
weight studies equally and also compare the effects in studies categorized by 
validity, subject matter, and other characteristics were not widely known in edu- 
cation. 

Cohen and Hyman (1979) addressed this overreliance on alpha for decision 
making in experimental studies and argued that effect sizes should be emphasized 
instead. Use of effect size in plannin'j and reporting studies would also aid 
research synthesis by providing a measure of the size of the difference between . 
groups, quite apart from its statistical significance. 

Glass (1978) has described a methodology for the use of effect size in research 
synthesis. Glass^ technique can provide better estimates of effects than 
simple counts of comparisons to ascertain the cumulative meaning of a body 
of research such as curriculum-evaluation studies, ■ 

Bredderman (1978) used Glass* approach to quantitatively synthesize the re- 
suits of over 60 evaluations of nationally developed elementary school (K-6) 

science curricula. Using this technique, Bredderman was able to estimate average 

\ 

effect sizes for each kind of program compared to traditional treatment controls, 
and the degree of impact of the programs on different kinds of students. One of 
Bredderman 's major findings was that curricular effects on student achievement 
were in harmony with curricular objectives and content, a finding basically in 
agreement with the less well founded conclusion of Walker and Schaf farzick. 

Welch (1979), drawing. on the results of 82 studies, counted studies support- 
ing certain generalizations about the effects of the innovative curricula on K-12 
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students. Welch's rough count of studies also gave results essentially in 
agreement with Walker and Schaf farzick . 

The purpose of the present study was to apply Glass' technique in the 
quantitative synthesis of the secondary school, grades 7 through 12, 
curriculum evaluation studies of the 1963-19.78 period. Only the nationally 
developed innovative program evaluations would be included. A thorough 
literature search combined with quantitative techniques that include all treatment 
comparisons would provide a new and critical look at the Walker and Schaffarzick 
conclusion. 

Among the techniques employed that permit more confident conclusions 
than previous comparison counts are estimates of the size and significance of 
the average effect size, and the dependence of the effect size, which in this 
case, represents a comparison of performance under innovative and traditional 
curricula, on the subject matter., grade level, type of outcome measure, and 
methodological qoialities of the study such as experimental and instrumentation 
validity. in addition, "vote counts" of positive and negative contrasts of the 
two types of curricula are reported to afford comparisons with the effect size 
method as well as with the results of previous syntheses. 

Method 

Search and Selection 

Studies were identified through a search of the computer-assisted Biblio- 
graphic Retrieval System (BRS) , whidh provides access to Dissertation Abstracts 
International, and the ERIC database of published and unpublished articles. 
The ERIC annual summaries of science education research were also consulted. 
The BRS search was conducted ilising the descriptors "curriculum development;."- . 
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"innovative/innovation," "science education in courses," "secondary schobl science/ 
elementary school science," and "1963-1978." A manual scan of the two major 
journals Science Education and Journal of Reseeirch in Science Teaching for , the 
years 1963-1978 supplemented the ERIC summaries. Studies that quantitatively 
compared traditional and nationally developed innovative science curricula on 
student learning outcomes, grades 6 through 12, were selected. 

Thirty-three studies representing 19,149 students in the United States, 
Great Britain, and Israel v;ere chosen for investigation. Among these, 13 
curricula are included, eight at the senior-high school level and five at the 
junior-high school level. To investigate the. Walker and Schaffarzick hypothesis, 
9 of 23 of their sources were included in this investigation. Those omitted 
did not meet either subject matter (science) or grade level (6 or higher) 
criteria or both. 

Coding ^ , 

To allow quantitative synthesis of both study characteristics and outcomes, 
■the following study-variables were coded for each comparison: study origin and 
source; subjects and setting, i.e. grade level, gender, ethnicity, academic 
achievement level, community SES and \irban-raral character; subject matter of 
treatment; treatment characteristics such as group size, elective or required 
course participation, regular or special teacher, lab or non-lab focus, reliability 
of implementation, length and quality of control group access to content; study 
design and nine categories of threats to validity; sample size; dependent 
measure type, origin^ reliability and innovative, traditional or neutral bias 
including an indication of the source of inf ormation-'on bias, i.e., author des- 
cription or independent inspection of the: test; and outcome statistics, i.e. 
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direction of effect, level of significance and effect size. 

The coded threats to validity were reliability of treatment; statis- 
tical power; error rate; maturation; history; selection bias; contamination; 
compensation or differential incentive; mortality; and generalizability . 
They were categorized either 1), potential threat or 2) adequately minimized. 
An overall index of design quality was taken from the sum, of these ratings. 

The five dependent outcome categories were: 1) conceptual learning, 
e.g., Concept Attainment Test (Cunningham, 1970); Taxonomy Test (Herron, 1966) 
based on the comprehension, analysis and application levels of Bloom's (1956) 
Taxonomy; and standardized achievement tests; 2) inquiry skills, e.g. 
tests of controlling variables, formulating hypotheses, critical thinking, 
and logical operations; 3) attitudinal development, e.g. any measure of attitude, 
interest, or opinion toward science or science related concerns; 4) laboratory 
performance, including observation, investigation and manipulative skills 
with actual appratus; and 5) concrete skills, i.e. classification of properties 
represented by pictorial stimuli/ Unlike inquiry skills, concrete skills 
require only observation and classification of directly perceived objects or 
pictures-; Inquiry \skills require some form of hypothetical-deductive rea- 
soning as in Piagetian formal operations. 

In coding the dependent measure, it was usually not difficult to distinguish 
between tests containing content favorable to the traditional versus the 
innovative programs. The Test on Understanding Science (TOUS) , however, 
while clearly not traditional in content, differs markedly from tests designed 
by ihost investigators of the new curricula. Such tests require the student to 
apply the inquiry skills gained in the innovative program, e.g., a trans- 
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parency test in ^SCS biology (Mascolo, 1969). TOUS, while designed to 
measure knowledge of the scientific point of view, does not require that 
inquiry skills be applied while taking the test. Consequently, although 
it is a non-traditional test, TOUS was grouped separately and coded neutral. 

The formula for calculating effect size was almost always one of the 
following (Glass, 1978): 



X -X . ^ 

ES = — — - or ES = 1 1 i + ij 

. % n>, ^ . 

1 

t 

where x^ and represent experimental and control grdup m'eans respectively; 
and s^ is the standard deviation of the control group.! Where applicable, t 
is computed from the t-test: statistic . When F was the result of a two group' 
comparison, t was considered equal tojp. In cases of one 'way analysis of variance, 
homogeneity of variance was assumed, setting s = JmSw. All effect sizes 
favoring the innovative curricula were given a positive sign, those favoring 
the traditional curricula a negative sign. 
Weighting Procedure .. • 

The number of effect sizes computed from. 'each study varied as a result 
of both the quantity of comparisons and the quality of supplied data. Some 
studies presented means and standard deviations for each of several test 
categories. Other studies failed to give one or both of these statistics. 
In consequence, the number of effect sizes per study rang^ from one to 33. 
To give equal weight to each study rather than to each comparison, each' effect 
size was assigned a weight equal to the reciprocal of the number o^ effect 
sizes in its study. Each of the 33 effect sizes from one" study received a 
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weight of 1/33. This proceUure weights each study equally and yields a smaller 
number of independent degrees of freedom than a count of unweighted comparisons 
which are not statistically independent. Unweighted signs were used in one 
instance, however, to allow comparison with Walker and Schaf f arzick ' s data- 
Data Analysis " i . 

A visual analysis of the set of effect sizes was made by plotting a 
stem-and-leaf diagram/ displayed in Table 1. 

Insert Table 1 about here. 
To pbtain an overall measure of the impact of the innovative versus the 
traditional curricula, a mean effect size was computed. To check for a 
'systematic influence of any coded study-variable on effect sizes, a one way 
analysis of variance was conducted on each study-variable, with effect size as the 
dependent variable. The categories of the study-variables were either the actual 
coded categories or a collapsed version of the coded categories. Grade levels 
for example, was converted to two categories: grades 6-9 and 10-12; while 
S-epa^rate categories of chemistry curricula were compared. The chemistry programs 
were: CHEMS (e.g. Hardy, 1970; Herron , 1966; Heath and Stickell, 1963; Pye 
and Anderson, 1967; and Rainey, 1964), CBA (Heath and Stickell, 1963; and Pye 
and Anderson, 1967, Nuffield (e.g. Kempa and Dube, 1974; and Meyer, 1970), and 
MCA (e.g. Charen, 1963). Table 2 lists all study-variable categories, category 
means, and F-test results. 

Insert Table 2 about here. ;• 

A study-variable of special interest was outcome test bias. In addition to 
the analysis of variance (see Table 2) on the test bias variable, a tabulation 
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(''vote count") of test bias against the number of effect sizes favorable to 
each curriculum type was made and is presented in Table 3. Unlike Walker 

and Schaf f arzick, the analysis also included nonsignificant effect sizes 
in the tabulation. 



Insert Table 3 about here. 
Results and Discussion 

Overall Effect 

The weighted mean of the effect sizes is .308; and the standard deviation 
•is .717. A t-test ,(t(25)-2.183; p<.05) indicated that this mean is 
significantly different from zero and favorable to the innovative curricula. 
Converting the results to percentiles, and placing students taking traditional 
courses at an average 50th percentile, students taking innovative courses scored 
on average at the 62nd perentile. 
Distribution of Effect Sizes 

Stem-and-leaf diagrams (Table 1) give an indication of distribution and 
magnitude of effect sizes, weighted and unweighted. Stems (on the left of the 
vertical line) are broken down into intervals of .2; leaves represent the first 
decimal place (tenthfe) of the effect size. The -.0 — .0 interval includes 26* 
unweighted effect sizes (ten falling in the negative .0 range, 16 falling in the 
i)Ositivc .0 r;uiqc — spo dingr:im in the lol'L) aiul four woiqhtecl offecL r»i/.os 
(two positive and two negative — see diagram on the right). Both stem- 
and-leaf diagrams indicate a predominance of positive dataiX)ints; 104 out 
of 151 of tho comparison-weighted effect sizes are positive. Thorc is a 
difference of 46 between the number of sign comparisons (197) and the 
number of com^juted effect sizes (151) a^ a»result of studies 
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which were lacking information ncK^^ssary to comijuto effect riizcr;. The 2G 
study-weighted effect sizes (on the right), 19 xxDsitive and 7 negative, 
are also based only on studies for which there was enough data to compute 
effect sizes. Peaks at .1 and .2 are consistent in both diagrams. The 
relatively larger number of points above .3 than below -.3 account for a 
mccin of about .3 in both cases. 

Influence of Study Variables 

The overall result does not depend on test bias (Tcible 2). The high weight 
frequency innovative and neutral tests show a clear superiority of innovative 
over traditional curricula. The large number of neutral tests in 
Table 2 refleqts the number of tests designed to favor neither the innovative 
nor traditional curricula in content and therefore classified as neutral. For 
example, Wasik (1971) analyzed the items of the College Entrance Examination 
Board (CEEB) Physics Achievement Test into categories of the Taxonomy (Bloom, 
1956) and found evidence to support its neutrality with /respect to both PSSC 
and non-PSSC students. Cunningham (1970) designed the n-outral Concept 
Attainment Test based on "Refraction," a topic covered by both PSSC and non-PSSC 
students in his sample. 

Table 2 also shows that neither subject matter nor type of dep^dent out- 
come significantly affected the innovative-traditional effect size mean. 
Principal subject characteristics such as grade level, gender, and academic 
achievement level also did not have any influence on the effect size mean. 

Among design quality features, "unit of analysis" yielded individual and 
group means of .23 and .38, respectively (F = 2.27, p'^.14). An increase^ increase 
in effect size resulting from group means is to be expected because individual 
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subjects often are a source of variation. However, the relatively small 
frequency for •'group" (N = 3) should be taken into consideration': When 
those studies involving TOUS (which happened to report group means) were excluded from 
the analysis, the group mean was .65 (F - .854, p<.37). Any infer- 
ences regarding "unit of analysis" are qualified by the presence of TOUS 
in some studies. 
Data Comparisons 

It may be recalled that' for reasons pertaining to selection criteria, not 
all of Walker and Schaff arzick * s, sources were included in the present inves- 
tigation. The 23 studies used by those investigators yielded a total of 98 
comparisons. From the 197 raw comparison-effect sizes obtained in the present 
analysis, a subset of nine studies overlap between the two invest igat'ions. 
From the 14 studies omitted from the present analysis. Walker and Schaffarzick 
reported 44 comparisons out of 53 significantly favorable to the innovative 
curricula (Table 3), This applies primarily to their findings for elementary 
students and mathematics curricula. 

The top of Table 3 provides vote counts of signs of Walker and Schaffar- 
zick 's and the present data. Although this allows for comparison of all com- 
parisons favorable to each treatment, the weighted tabulation at the bottom 
of Table 3 is a more accurate reflection of the results of the present study 
since it gives equal weight to each independent' study . Either procedure 
(iacluding Walker and Schaf f arzick ' s data) yields a ratio of approximately 
four to one in favor of combined outcomes favorable to the innovative curricula. 
By the vote count method. Walker and Schaffarzick found stronger evidence 
than the present study for the superiority of the innovative curricula on the 
innovative tests (44 out of 45 significeint outcomes for their data; 29 out of 37 
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for the present data). But the present data, both unweighted and weighted, 
shows the superiority of students taking innovative programs on neutral tests 
(31 out of 38 unweighted outcomes; 6 out of 7 weighted outcomes) and on tradi- 
tional tests in the unweighted data (4 out of 6) . 
Cone lusion 

Although great national interest in science curricula by the general public 
and professional educators may have abated in the 1970s, the post-Sputnic (1958) 
curricula produced beneficial effects on science learning that extended across 
science subjects in secondary schools, types of students, various types of 
cognitive and affective outcomes, and the exper .imental rigor of the research. 
Past reviews showed the percentage of positive results; but the present analysis 
shows a moderate 12 point percentile advantage on all learning measures of 
average student performance in the innovative courses. 

Contrary to Walker and Schaff arzick , who used earlier methods of research 
synthesis and concluded that performance merely reflects content exposure, 
the present results suggest that studenfr: in soconrlnry- school science courses 
ijcore moderately better (Effect size - .308) than students in ttMditional courses on 

both innovative and neutral tests and negligibly lower (Effect Size = -.04) 
on tests favoring traditional science content. 
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Tabic 2 

DGScriptivG Statistics for Independent Variables* 



Study Variable 
Totals 

Dependent Measure 
Test Bias 

Innovative 

Traditional 

Neutral 

Test Bias Information 
Source 

Independent 

Test Description 
(Author) 

Lacking Information 
Test Type 

General 

Discipline 

Course/Curriculum 

Science 
Origin 

Local 

Published 
Reliability 

>.80 



Mecin Effect Size 
.307 



.36 
-.04 
.29 



.19 
.33 

.35 

.33 
.08 
.54 
.29 

.28 
.33 

.72 
.29 



Standard Frequency 
Deviation (Sum of Cases) 



.717 



.69 
.00 
.78 



.18 
.86 

.58 

.48 
.73 
.75 
.85 

.70 
.76 

.00 
.72 



26 



11 
1 
14 



7 
17 



5 
7 
6 
8 

11 
15 

1 

25 



Prdb. 



.18 .84 



.23 .87 



.42 .74 



,03 .86* 



.93 .34 



2S7 

0 

ERIC 



.1 L'J 
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Validity 
Adequate 
Inadequate 

Study Source 

Referred Journal 
Dissertation 
Subject Matter 
Biology 
Chemistry 
Physics 

General Science 

Physical Science 

Integrated, Unified 
Science 

Chemistry Curricula 

Chems 

CBA 

Nutf ield 
MCA 

Outcome Measure 
Conceptual 
Inquiry 
Attitude 
Lab Performance 
C .jrete Skills 



Table 2 
(c^n't) 
X)ago 2 of 7 



Mean Effect Size 

.32 
.09 

•33 
.11 

.41 
.001 
.52 
.37 

-.01 



.14 

-1.45 
-.01 
.08 

.39 
.21 
.16 
.59 
.16 



Deviation 

.74 
.24 



.74 
.47 

.66 
.63 
1.03 
.39 
.00 
.73 
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.35 
.00 
.81 
.00 

.87 
.41 
.62 
^.00 
1.29* 



CurricuJuin I^Jffocts 



FrcqMf^ncy F Prob. 



24 
2 

23 
3 



6 
6 
6' 
4 
1 
1 



1 
0 
1 
1 

13 
6 
4 
1 



.15 .70 



.24 .63 



.42 .83 



.76 .69 



,19 



.96 
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Subjects and Setting 
Location 

USA 

Great Britain 

Israel 
Grade Level 

Below 9 

9 - 12 
Gender 

Male 

Female 

Mixed 
Ethn ic ity 

White (mixed) 

Other 

Not reported 
Academic Achievement Level 
High 
Medium 

SES 

I 

Middle Class 
ijpper Middle Class 
. Mixed 

Not reported 



Table 2 
(corrt) 
page 3 of 7 

Standard 
Mean Effect Size Deviation 



.32 
-.02 
.59 

.20 

,.35 

-".07 
.10 
.38 

.""2 5 
.72 
.78 

.03 
.41 
-.14 

-.11 

.39 
.61 
.30 



.75 
.71 
.51 

.51 
.79 

.47 

1.00 
.74 

.68 
.00 

1.42 

.83 

.71 
.47 

.33 
.55 
.96 
.76 



Frequency F Prob. 



20 
3 
3 

7 
19 

3 
2 
21 

23 
1 
2 

4 

20 
2 

2 
2 
3 
18 



.53 



.22 .64 



58 .57 



.61 .62 



.86 .44 



,31 



.87 
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Table 2 
(con * t) 
page 4 of 7 



Community 
Urban 
Suburban 
Rural 
Mixed 

Not reported 



Mean Effect Size 
.30 
.10 
.28 
.25 
.53 



Standard 
Deviation 

.40 

. .45 

:53 

,90 
1.07 



FreqXiency 
5 
5 
3 
7 
6 



F Prob 
.23 .95 



Sample Size 
less than or equal to 50 

51 - 100 

101 - 200 

201 - 500 

501 - 7 50 

> 750 



.64 
. 56 
. 17 
.22 
-15 
.25 



1.52 
.79 
.48 

/ 



/ 



.66 
.31 
.27 



3 
5 
5 
6 
4 
2 



.3,6 .90 



Treatment Characteristics 

Experimental/Control Group Size 
Comparable • 

Different (+5) 



.38 
.27 



.90 
.63 



9 
16 



09 .92 



Experimental Group Participation 

Elective .2 3 

Required .34 

Either (ie Biology) .58 



.63 
.91 
.71 



15 
8 
3 



.30 , .75 
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Experimental Teacher 
Regular 
Special 



Table 2 
(con 't) 
page 5 of 7 



Mean Effect Size Deviation Frequency 



.24 
.50 



.78 

.52. 



19 
7 



Prob , 



.72 .41 



Control Teacher 

-Comparable to Experimental .32 
Different - .17 



,75 
.18 



24 
2 



.08 .78 



Focus of Instruction 
Non- lab 

Lecture and lab 
Lab only 



.51 
.29 

.08 



.45 
.78 
.43 



3 

21 
2 



.22 .81 



Quality of Instruction 

Curriculum^ Course 

Teacher Behavior and 
Material 



.28 
.39 



.78 
.50 



20 
6 



.10 .76 



Measure of Variable 
Self Report 
Expcsrt Report. 

Pre-determined in structure 
of materials 

Cannot be determined 



,08 
.61 
.31 

.33 



.43 

■\oo 

.67 
1.01 



2 
1 

17 



.22 .89 
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Tablo 2 
{can * { ') 
pixgc G o£ 7 



Length of Treatment 
Less than 1 hour 
1 week (11-50 hours) 
Course (10 weeks or more) 



Mean 'Effect Size 
3.27 
.35 

.29 <^ 



Deviation 
.00 
.58 
.72 



enrricuiyurn ICffocts 



Frequency 
0 
2 
23 



F Prob . 
... .42 .74 



Control Grou^^ Access to 
Treatment 

None 

Comparable 
Sair^pl^e Selection 
Simple Random 
Purposive .(extreme) 
Matching 
Conveni ence 



.30 
.0 

.44 
.55 

.ai 

.23 



.73 
.00 

.91 
.00 

, 67 
.31 



24 
1 

8 
1 
12 

5 



.42 



1.28 



.66 



.31 



Unit of Analysis 
Individual 
Group 



.23 

.88 



.53 
1.68 



23 
3 



2.27 .14 



Des icjvi 

Quasi-experimental 
^ — -14?^ -experimental 



.31 
.61 



.73 

.00 



24 
1 



, 58 . 57 
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Oirri (!ulum l^ffocLs 
2 3 

Table 2 
(con' t) 
pcicjo 7 of 7 



Quality of design Mean Effect Size Deviation Frequency F Prob. 

Average .30 .83 18 .006 .94 

High .32 .39 8 



* Weighting procedure rounds reported statistics to the nearest integer. 
Ail tests are run with fractional figures included. 

Computed from sum of potential threats: high quality indicates threats were 
minimized . 
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Test Bias 

Favors innovative 

Favors traditio.ial 

Neutral 

Totals 



'Test Bias 
Favors innovative 
Fa vo rs t radi t iona 1 
Neutral 
Totals 



Tesc fixas 

Favors innovative 

Favors traditional 

Nentrr^.l 

Totals 



Table 3 

Innovative-Traditional Comparisons 
Present Data 
Result of Comparison ^ 



I>T 
29 
4 

31 
'64 



T>I 
8 
2 
1_ 
17 



I=T 
26 
7 
6£ 
98 



I>T 
44 
5 
4_ 
53 



I>T 
7 
0 
6 

13 



Walker and Schaffarzick Data 
Result of Comparison 

T>I I=T 

1 7 
9 16 

2 9 
13 32 

Present Data - Study Weighted 
Result of Comparison 

T>I I=T 
1 5 
1 2 
1 8 

3 15 



Total 
63 
13 
103 



179 



Total 
52 
30 
16i 
98 



Total 
13 
3 

15_ 
31 



ERIC 



Note: I = Innovative Curricula; T = Traditional Curricula 
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The purpose of this paper, as suggested by the title, 
is to reflect on recent writings on research-synthesis tech- 
niques as well our own experience in integrating research 
findings across studies. These reflections draw Targely : 
on the writings of N. L. Gage and Gene V Glass as well as 
our ongoing work in analyzing studies in the areas of open 
education, quality of instruction, reading instruction 
methods, learning environments, and home environments. 

This paper is a working draft for a small, informal 
conference that is to assist in the planning of synthesis 
work for the National Science Foundation in the specific 
areas of quality and quantity of instruction, ability and 
motivation, home and peer environments, and the social- 
psychological environment of the class—all in relation 
to educational outcomes. Therefore, the ideas brought out 
here should be considered preliminary for the reactions 
of the two discussants at the conference and two or three 
other people conducting research syntheses. 



c> ■ 
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Reactions to Glass' Chapter 

Gene Glass properly notes at the beginning of hir. present 
review chapter that conventional or traditional narrative 
reviews of research may have been adequate in the past when 
only five or ten studies were available for analysis • How- 
ever, the recent growth in educational and psychological 
research in the last several decades has made^it necessary - - 
to use more advanced statistical techniques for synthesizing 
the findings from the many original studies now available. 
The full range of statistical techniques starting fron. 
elementary frequency distributions all the way through multiple 
regression analyses can usefully serve to synthesize findings. 
It is immensely difficult in most areas of educational 
and psychological areas of re.T^earch these days for an investiga- 
tor or reviewer to fully understand the meaning of the results 

unless they are somehow condensed, preferably by objective 

c 

statistical techniques . 

Glass also points out that although the techniques 
of light and Smith are valuable in synthesizing original 
data from studies, they may be somewhat impractical, particular- 
ly in education, since it is often difficult to gain the 
original research data. Both the independent and dependent 

ariables from most educational research studies are measured 
on uncomparable variables or scales. In addition, many 
investigators are unable or unwilling share their original 
data. And when they do, the scales cannot be effectively 

2Sn 
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compared; for example, even percentiles ax.d grade equivalents 
_ make for questionable comparisons because the norming of 
different instruments from different publishers are based 
on different sub-populations in different years, all of 
which introduce bias and error into the analysis. 

In his section on problems of access to data. Glass 
makes a number of excellent points on -ERIC, dissertations, 
microfilms, and other techniques and materials that can 

^P^°i'^^ attempting to gather studies for syntheses. 
Bouh backward and forward citations can be helpful in identi- 
fying a population of studies for review. a point needing 
emphasis is that it is becoming commonplace for reviewers 
to dismiss, on certain selected methodologJcal grounds, whole 
bodies of research literature. Very often quite extensive 
reviews of studies wind up with two conclusions: 1) that 
more research needs to be done, and 2) that the prior studies 
are so weak that nothing can be concluded from a corpus 
of literature. Although it is possible to argue this case, 
I believe it is basically unconstructive for building. a 
science of education. The essence of science is the accumu - 
lation and replicati on of ev idence. Many hours of work 
have gone into these studies and reviewers who conclude 
that nothing can be made of them tend not only to dismiss 
others' work but the integrity of the field of educational 
research as well. 
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I believe that a more positive approach must be urged 
upon those who wish to synthesize research. The techniques 
of metci-analysis have already shown that it i quite possible 
to make definite conclusions across studies when the synthesis 
techniques are powerful and sensitive. Glass and Smith's 
reviews of class size and psychotherapy, and Robert Horowitz 
and Penelope Peterson's reviews of o'pen education have al- 
ready shown consistent results across hundreds of comparisons 
that suggest substantive conclusions with definite policy 
and practical implications. Such research syrf-heses can 
also indicate which types of methodological virtues and 
flaws seem to be the most decisive in determing the outcomes 
in question. Lastly, this work has shown which parvxcular 
areas of research within'a,. given 1^pic have been infrequently 

studied and hence can point to^. the ntost decisive kinds of 

\ 

studies that can be done in future work. 

\ 

Although Glass makes a n\imber. o'^f constructive and practical 

\ 

suggestions fo^ doing high quality synthesis in educational 
research/ a:i additional point needs to be made. It is possible 
to set up such high standards for literature .selection and 
search that' the extent of the investigation goes far beyond 
the reaches of a particular investigator's budget, time, 
and energies. The investigator who begins a research syntnesis 
needs to think through very carefully the trade-offs betv/een 
the scopa of the literature and the various analytic techniques. 
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Just as no single, empirical study can ever be done perfectly 
it seems unlikely that any meta-analysis of a significant 
topic, ip education can be done without some imperfections. 

Therefore it is important to plan the scope of the 
literature and the selection of studies as well as the analytic 
techniques early so that the investigation does not get 
out of hand. My personal experience in conducting and 
advising on large scale primary studies indicates to me 
that too often investigators are overly ambitiQus in their 
efforts and instead of bringing to completion, and publishing 
studies^of a modest scope, -they often set such high standc-r--:2,s 
and vast scope as not to be able to finish. Too often, 
it goes unrecognized in educational research that replication 
is the essence of science. Often three or four modest studies 
if they agree in their results can be much more creditable 
than a single, very large study, especially if the large 
study expends too much of the effort and budget on data collec- 
tion rather than analysis,, reflections, and writing. It 
can be added that replication by independent investigators 
should be as important in^^yntheses as it is in primary 



research. 



One idea of Glass was also used by Robert Rosenthal 
and can be valuable in cutting down the size of a research 
synthesis so that it can be manageable and completed v/ithin 
a given time schedule. In some areas it is possible to find 
hundreds of studies, when they cannot all be reviewed. 
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it makes per'fectly good sense to use the standard techniques 
of statistical sampling. A simple random sample can be 
taken; or every third or tenth study might be selected. 
-In addition, stratified sampling may be helpful, that is, 
an investigator can group the studies into several sets 
and take a random sample within each set. This is another 
idea that is just as appropriate in research syntheses as 
it is in primary research studies. 

Another technique that may be used to cut down research 
syntheses to manageable size is to establish some selection 
criteria from the very beginning. Although, of course, it is 
desirable to gain a very large popizlation of studies to 
cover all areas, the investigator can consider the important 
policy questions or substantive interests in setting up 
criteria for admissible studies. For example, one could 
select studies that have been done after a particular date, 
or one might confine scope of the syntheses to elementary 
and secondary schools. James Kulick confined his syntheses 
of Personal Systems of Instruction to studi^s^ in higher 
education. One might also decide to confine the studies 
to those which have examined the effects on particular 
outcomes. Glass and Smith, for rxair.ple, have done one analysis 
of the effects of class size on student achievement measures. 
In subsequent and separate syntheses, they pian to analyze 
the effects of cla.=s size on additudinal outcomes. 
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Both Glass, and Light and Smith, and Gage are critical 
of the voting method. This method pertains to those reviews 
that count the number favorable effects, mixed and nonsigni 
ficant effects, and unfavorable affects for some particular 
educational treatment. I share many of their reservations 
about the voting method because, as Glass points out, "to 
know that televised instr.uction beats traditional instruction 
in 25 of 30 studies is not. to know vhether Tl' wins by a nose 
or a walk^away." And, as Glass further points. out, one 
ought to average measures of the strength of the effects 
or relationships among the variables rather than simply 
tabulate their sign and possibly significance. 

On the other hand, if one has a choice of the voting 
method or no summary at all, it is perfectly clear that 
knowing how many studies showed favorable and unfavorable 
results is of greater relative value. In addition, the 
voting method and more complicated procedures that take 
into consideration the probability or strength of the findings 
will often come to the same conclusions. The vo.ting method 
has the advantage of being readily understood by educational 
practitioners, and it is a simple method to use on the 
part of educational researchers. It may not uncover subtle, 
interactive, small effects; but if there are consistent 
effects across a series of studies they will certainly be 
revealed by the voting method. 
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X believe that the techniques of analyzing correlations 
and effect sizes are far superior to the voting method. 
But having half a ;.oaf in a field that badly needs syntheses 
it is better than no loaf at all. Moreover, the new tech- 
niques of research syntheses are quite complicated to use 
and to understand, even on the part of well trained investi- 
gators. It is also true that quantitative research syntheses 
have received already some' initial skepticism over the begin- 
ning efforts. Since there are seeds of doubt as to which 
particular analytic or summary techniques should be used, 
it may be advisable in the next five years as we gain experi- 
ence in using the techniques to use both the simpler techniques 
to encourage understanding on the part of those who will 
digest the findings as well as using the more complex techniques 
such as regression of effect sizes which are certainly more 
powerful and sensitive to complicated effects in the data. 

Another point that deserves amplification in the Glass 
review is the overemphasis that researchers and reviewers 
have given to statistical significance. It is obvious that 
many educational researchers have used students inappropriate- 
ly as the units of analysis so that the significance levels 
are inappropriately high. Moreover, it should be much more 
impressive to us that aneffect is consistently positive 
across a series of studies or investigators and laboratories 
than that s«^varal of the studies happen to be statistically 
significant. 
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Glass praises Gage'ii recent informative book on integra- 
ting studies on teaching but .-citizes Gage's advocacy of 
Pearson's Chi-square test for integrating probabilities 
across studies on the grounds that the number of studies 
will be so large and encompass so many subjects that no 
hypotheses will be routinely rejected.. There are complicated 
statistical arguments on this matter that still require 
solution. However, it may be soine time before our statistical 
colleagues settle some of these issues. Therefore, it would 
seem reasonable in the mean time for educational research 
synthesists to try using several of , the technique .; simultane- 
ously. In addition to statistical issues, how convenient 
the procedures are to employ and how easily educational 
practitioners can understand then' require consideration. 
Hopefully, before settling on one mebhod simultaneous analyses 
will indicate the same results and thus will satisfy advocates 
of the several rival techniques and allow us all to concentrate 
on substantive and policy implications. 

Another valuable point made by Glass is that a variety 
of simple and complicated techniques can be used in doing 
research syntheses. Tables, graphs and simple descriptive 
measures of location and spread will enable readers to com- 
prehend the idea of syntheses across studies. In addition, , 
multiple regression can pars.-.aioniously sxjmmarize in one 
equation all of the results of the syntheses. Such a com- 
bination of simple and complicated techniques was recently 
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used by Margaret Uguroglu in her analysis of the relation 
of motivation and achievement. The correlations between 
these two variables were tabulated separately for different 
grade levels, for different subject matter areas, and for 
different motivation constructs. The reader is allowed 
to see where the gaps in the data are and the general tenden- 
cies with respect to the major variables. After digesting 
these findings the reader is presented with a multiple re- 
gression equation that neatly summarizes the results with 
several coefficients, showing that the size of the correlation 
depends principally on the age of the students and the relia- 
bility of the measure of, motivation. Going from the simple 
to the complex is a good pedagogical technique that enables 
readers to gain an understanding of the univariate dependen- 
cies before going on to; a more complex multivariate synthesis. 

Glass and Rosentha^L independently arrived at a statistic 
called the "effect size", which is simply the mean of an 
experimental group minus the mean of a control group divided 
by the within-groups standard deviation or the control- 
group standard deviation. Glass and Smith used the control ^ 
group standard deviation in their, meta-analysis of psycho- 
therapy but used the combined wi thin-group standard devia- 
tion in their meta-analysis of class size. The control 
group standard deviation may have two advantages: 1) it 
enables the investigator to determine where the experimental 
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group lies in the metric of an untreated control, and 2) 
it can be readily copied from reports that give separate 
standard deviations for experimental and control groups. 
On the other hand, the within-groups standard deviation 
makes use of the complete variation within both groups and 
may be a more stable measure. 

Glass presents a number of statistical techniques for 
calculating effect sizes when only the F or T ratios are 
given. These techniques will be valuable for including 
the maximum number of studies; but in the interest of 
efficiency it would also be possible, although perhaps less 
desirable, for an investigator to eliminate a priori those 
studies that did not have full statistics as one of the 
selection criteria. 

In areas of research which are basically correlational 
rather than experimental or quasi-experimental, it is generally 
advisable to analyze correlations. For example, as discussed 
by Glass, White found that six hundred, thirty-six available 
correlations of socio-economic status and achievement averaged 
.25 with a standard deviation of about .20. The correlation 
diminished as students got older; the correlation devreased • 
from about .25 in the primary grades to about .15, late in 
high school. Socio-economic status also correlated higher 
with verbal mathematic achievement than other outcomes. 
Glass points out that there is no good reason to transform 
the correlations to Fisher's Z since it will seldom make 



much practical difference. Glass also gives a series of 

guidelines for converting t's and point bi-serial correlations 

and contingency tables statistics to Pearson correlations. 

In the closing sections of his review Glass mentions 

the problems of differential weighting of studies either 

by the number of comparisons or the number of students on 

which a particular comparison is made. These weighting 

problems can prove highly complex. A small study of one 

hundred students may make as many as 20 comparisons if, . 

for example, the comparisons are broken down by sex and 

outcome measure and independent variables of various kinds. 

On the other hand a study of one thousand may report only 

the means of an experimental and control group. In simply 

analyz-ing the average effects over all these comparisons 

one would be weighting the smaller study 2 0 times as much 

as the single study. Glass makes a point/ to which I would 

agree, that we usually do not have the luxury of throwing 

out the smaller studies since when various classifications 

.. ^ ■ ■ ■ 

are dene the cell sizes for comparisons may be too small. 

Therefore, it will generally be necessary to analyze all 

the comparisons. 

It is possible to deal with this problem through a 

weighted regression analysis which is available on some 

computer programs. Another and possibly easier alternative 

is to perform what economists call a sensitivity test. That 

is to eliminate one or several studies at a time from the 
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analysis to determine how the overall results are affected. 
Still another possibility is to plot either the effect sizes 
or correlations or the residuals from a regression on a 
variable such as the numbers of cases on which comparisons are 
being made. By examining such scatter plots unusual results 
can usually be detected. If the results do not appear to 
be determined by sample size or other characteristics it 
can be safely concluded that the results are not dependent 
on aberrations of sample sizes or other variables. 

Reactions to Glass and Smith's Analysis of Class Size 

In Glass and Smith's analysis, a number of examples 
of research synthesis techniques make more concrete the 
comments that Glass has made earlier and that I, in turn, 
have commented upon. These points deserve emulation in 
future research syntheses. Glass and Smith begin by noting 
that prior reviews, have been haphazard and over-selective 
in reviewing the literature. Moreover, the reviews are 
narrow and disc:ursive, use crude classifications, and over- 
emphasize statistical significance. In contrast Glass 
and Smith's analy3:.s shows very definite and significant 
beneficial effects of small class sizes. 

Glass and Smith uncovered some 80 studies which exceed 
by 50% the number in the largest prior review, but they estimate 
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that they perhaps found only about half the ^udies that 
might be found using still more exhaustive search procedures. 
Hundreds of dissertations were scanned but/Only 3 0 seemed 
worth purchasing and 16 were actually useful. That only 
80 studies of perhaps 160 that may be^ eventually uncovered 
were found in a fairly exhaustive research shows that there 
are diminishing returns in attempting to find additional 
literature. The fact that Glass and Smith went back some 
seventy years to uncover fugitive materials and ordered 
dissertations and' unpublished studies indicates that such 
diminishing returns are likely to occur. 

One could always .recommend that the additional 80 studies 
should have been sought out; but, as I have emphasized above, 
it may be impractical to dc so. In/fact, contrary to Glass 
and Smith, it might be argued that dissertations and un- 
published reports should have been excluded. The published 
literature is more readily accessible, and is likely to 

/ 

be refereed and of higher caliber. The effort to getydisser- 
tations or unpublished reports may require three times 
more effort than a published paper. Renouncing unpublished 
material might have made it" possible for Glass and Smith 
to review the effect of class size on not only achievement 
but. also affective outcomes v^ith the same amount of\time, 
energy, and budget. Either strategy would be defensible, 
but it should be emphasized that a traderoff among the 
different areas of effort is required in making a research 
synthesis. 
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Glass and Smith describe several studies in detail; 
these illustrate, as they. point out, the characteristics 
and texture of the research literature that is reviewed. 
To simply report statistics, and particularly niuubers from 
'the 'tables, would be inadequate; the reader needs a quali- 
tative feel for a few illustrative studies to understand 
the statistical results . 

In planning the coding of studies,. Glass and Smith 
identify characteristics that may interact with class size 
in determining achievement levels. First they read- a few 
studies, then talked with experts, and finally made their 
best guess as to which characteristics of the studies should 
be coded. Modifications could later be made in the coding; 
but if one changes the coding, all the studies that, have 
been done up to that point need to be re-coded. 

Glass and Smith used five broad categories in categori- 
zing the studies: 

(1) study identification, 

(2) method of instruction, 

(3) classroom demographics, 

(4) study conditions, and 

(5) outcome variables. 

In all they included 25 specific continuous and qualitative variabl 

/ 

under these five categories. Including these specific 
variables makes it possible for the analyst to determine whether 
the relation between class size (or any other variable 
being invv^stig.?.ted) and achievement is dependent on the 
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characteristics of the study or the characteristics of the 
populations being investigated, such as elementary and se- 
condary schools. In our work, we have used more detailed 
and exhaustive categories than did Glass and Smith. For 
example, instead of simply using three or four experimental 
design categories, we have used the threats to validity 
from the Cook-Campbell chapter which is much more extensive'. 
This only goes to emphasize the various kinds of trade- 
offs of energies that can be planned. 

On further reflection about our own work, I believe the 
Glass and Smith more simple characterization of experimental 
designs is more practical. It only requires one variable 
with several levels to record. On the other hand, we are 
using approximately 14 variables, each with about three 
levels. Such detailed coding of methodological characteris- 
tics has the disadvantage of requiring more time but permits 
the options during the analysis of either grouping or not 
grouping methodological characteristics. 

On page 12 of their review. Glass and Smith note that 
the within-group standard deviation was used in their analysis 
We noted, however, in their analysis of psychotherapy that 
the control-group standard deviation was used.^ As commented 
on earlier, the trade-off between these two different metrics 
each have advantages. On page 18, Glass and Smith report 
their results in percentile metrics rather than Z-scores. 
In any final tabulation of the results it is possible to 
present these in either one of the terms. Most educators 
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will understand the percentile results better and hence 
the Glass and Smith report on class size will be readily 
comprehensible to practicing educators • One of the many 
fine characteristics of the Glass and Smith report is the 
extensive use of concrete information, particularly numerical 
'information throughout the report. 7or example, on page 
19/ it is stated that 77 studies were reviewed, 725 effect 
sizes were calculated, and that these were based on some 
900 , 000 ^ students over a period of 70 years in research in 
12 countries. 

Beginning on page 20, Glass and Smith provide a series 
of univaricite tabulations which make clear how many times 
various samples and measures have been researched in the 
class -size literature; they are unable to identify from 
these figures the under-studied and over-studied areas. 
It might be added that it would be useful in the Glass and 
Smith report not only to present the frequencies in which 
samples have been investigated but also ti>e average effect 
size for the various cells so that the reader can see the 
univariate dependencies before going on to controlled 
comparisons and grand means across all studies. 

Glass and Smith perform what economists call sensitivity 
analyses. They note, for example, that the relationship 
between class size and achievement is stronger in those 
studies that have randomly assigned students to class sizes 
in strict experimental design terms than in correlational 
studies. These results give strong reason for imputing 
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causality, since it discredits rival hypotheses such as 
the co-determination of size and achievement by educational 
spending or community social class • Glass and Smith point 
out whether exj^eriraental controls mediate findings is an 
empirical, not an a priori question. In their review of 
psychotherapy, it was not found that experimental rigor 
determined the strength of the relationships but in the 
present study of class-size it did. 

On page 42, Glass and Smith note stronger effects on 
elementary students. It appears that the age or grade levels 
of students should be included in most meta-analyses of 
educational effects because in reviews we have been examining 
the age level has usually mediated the relationships between 
the independent and dependent variables. For example, Ugurog^u 
found stronger relationships between motivation and achieve- 
ment in older secondary samples than in elementary school 
samples. White found that the correlation between social 
class and achievement was higher in younger samples. 

In conclusion, t;he fine work of Glass and Smith is likely 
to become a classic synthesis in educational research. I 
believe it- essentially settles the class-size question after 
so many years of uncertainty and controversy cjnd points 
-confidently to the benefits of smaller classes. Ten more 
syntheses of this quality will make educational research 
a scien ce of results rather than of mere methods. 
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Coimnentg on Gage's Book 

Gage has contributed a number of useful insights for 
investigators who are about tc begin research syntheses. 
Although the book centers on the quality of instriiction , 
his insights have implications for other substantive areas. 

On page 26, Gage makes a central point which should 
be considered in all research syntheses. Nine prior narrative 
reviews of the effects of teaching on learning conclude 
that educational research has not identified the consistent 
replicable features of teaching that are related to student 
outcomes. Gage points out, however, that these conclusions 
may be due more to the faults of the reviewers than to the 
totality of original research itself. Reviewers have made 
a great number of errors in attempting to synthesize the 
research. Many studies of teaching, for example, are based 
on limit'id numbers of teachers. Therefore, the results 
may not be statistically significant. On the other hand, 
to return to an earlier point made here, "replication is 
the essence of science." It is not two or three significant 
relationships that are important, but rather consistency 
of the direction and the magnitude of the effects across 
many investigations. 

Gage critizes Duncan and Bidd'^e's exhaustive review, 
titled The Study of Teaching . Not only did Duncan and Biddle 
err in over-emphasizing statistical significance, they were 
not explicit in stating how studies were categorized as 
showing positive or negative effects of a particular teaching 
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technique. Just as explicit procedures are necessary in 
primary research, it is important that explicit objective 
procedures be followed in reviewing research as well. 
Duncan and Biddle, despite the great length of their book, 
do not describe exactly how a determination was made of 
what particular variable had favorable affects on student 
outcomes. They claim to use sub j ective clinical proce- 
dures, but these procedures are not spelled out. One has 
no way of knowing exactly how they were accomplished or how 
a person could repeat the procedures as a check on the re- 
viewers . . Such a review must be an argument basically from 
authority rather than categorized evidence. Such arguments 
are not in the domain of science. 

It might be added that the Duncan and Biddle review is 
more in the nature of advice to practitioners in some respects 
than a report to scientists. There is an inherent conflict of 
interest between the practitioner and the scientist that 
occasionally plagues education. The scientist wants to know 
exactly how the results were obtained; whereas, perhaps practi- 
tioners might be satisfied with conclusions and advice. This 
conflict seems to be a major difficulty in many research 
reviews where conclusions gain more prominence than proce- 
dures of coming to the conclusions and the nature of the evi- 
dence. 

On page 38, Gage makes the excellent point that some 

teaching variables do not vary over a broad range. It 

p 

IS well known that if a variable does not vary it cannot 
co-vary effectively or with statistical significance with 
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another variable. Gage wisely recommends considering 
the var4.ability of the independent variables of teaching 
when considering their relationship to student outcomes. 

Another interesting point made by Gage is the need 
for separating the results of different educational outcomes. 
In addition to citing the work of Kulick and McKeachie, 
he cites evidence that higher level thought questions 
seem to produce lower levels of achievement among students. 
The results suggest that lower-order questions produce 
more factual achievement and higher level questions produce 
better results for higher cognitive levels of achievement. 
If the results are mixed together, the analysis will be 
insensitive to an important distinction that applies in : 
the research data. 

An additional example is Horowitz's box score tabulation 
of open-education effects later confirmed by Peterson's 
tabulation of effect sizes. Both of these reviews indicate 
that open education overall seems to lead to slightly 
lower factual achievement on standardized examinations 
but strongly higher levels of performance on tests of 
creativity id independence ariS various affective outcomes. 
This work confirms Gage's point that the results from 
different outcome variables need to be tabulated separately. 
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Comments on Uquroglu's Synthesis 

The Uguroglu-Walberg synthesis of the relationship 
between motivation and achie/ement suggests a niomber of 
points that can be mentioned here. One of the first 
points to be brought out in the Uguroglu paper is the 
chestnut that correlation does not imply causation. Simply 
tabulating hundreds of correlations between motivation 
and achievement does not establish, for example, whether 
motivation causes educational achievement or achievement 
causes higher levels of motivation or whether both factors 
are caused simultaneously by other variables. Nevertheless, 
a general estimate of the correlation between the two 
and showing how the correlation varies across various 
samples and types of motivation is useful in establishing 
what Blalock as called "an inventory of causes and effects". 
In this particular instance, if the correlation is found 
to be consistently positive, motivation ought to be entered 
into experimental and survey designs "that hope to elicit 
the causal dependency of achievement on the production 
factors in education. 

The Uguroglu paper illustrates a dilemma of educational 
psychology. Since the James-Lange theory of emotion there 
have been many theories of motivation. There has been 
much arm chair speculatJ.on and voluminous writings 
in the field. Nevertheless empirical work in education 
and psychology rarely fits a particularly psychological 
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theory or tests one theory against another in their power 
to explain empirical results. Therefore there is a great 
gap between theories and empirical work, and it is usuaHv 
difficult to establish the constructs being investigated 
from the empirical works. For example, Shavelson mentions 
22 review articles on self-concept alone that show 17 
different conceptual categories. Since self-concept is 
only one sub-construct of motivation, one can see that 
the total number of constructs and sub-constructs can 
be quite large and beyond the limits of synthesis. It 
will be difficult to find several studies of each of the 
s\ib-constructs and therefore difficult to establish the 
relation or correlation of each sub-construct with different 
types of educational outcomes. 

On page 4, Ugnroglu gives a one or two sentence overview 
of each of five major views of the field of motivation, 
which will be useful for readers who want to get particular 
perspectives beyond the summary' on empirical relationships 
between motivation and educational outcomes. On pages 
5 and 6 Uguroglu introduces the idea of replication in 
meta-analysis by taking from the works of Benjamin Bloom 
a calibration sample of 122 correlations. Working in 
an explicit framework she also searched through a Psycho - 
logical Abstracts Reading Research Quarterly for a. three 
year period for more recent studies: The empirical analysis 
can ask if the overall correlation estimated by Bloom is 
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replicated in the validation sample. Uguroglu tabulates 
the correlations by sample size, grade level, sex of the 
sample, reliability of the motivation measure, nationality, 
and characteristics of the motivation and outcomes measures 
It appears that the age of the sample and the characteris- 
tics of the measures, including their reliabilities are 
^liicely to turn up to be significant determinations of 
the correlations between education production factors 
and educational outcomes; these characteristics then should 
be included in future meta-analyses. 

On page 8, Uguroglu presents stem-and-leaf diagrams. 
Each value of these show each individual data point in 
the total sample. . This gives readers a concrete feeling 
for the range of the data and aberrent data points. The 
stem-and-leaf diagrams for the calibration and validation 
samples show the interesting distributional properties 
of two. The calibration sample reveals a slight tendency 
toward bi-modality which peaks at about -30 and .51. The 
validation sample is more normally distributed but has 
a few negative correlations based on the younger children ^ 
in, the primary -grades . The validation sample also has 
two outlying correlations, .98 and -.31. Stem-and-leaf 
diagrains are a useful way to introduce statistic- 1 analysis 
of correlations to readers who may not be familiar with 
the idea. 

Pages 9 and 10 discuss the dependency of the correlation 
of achievement and motivation on characteristics of the 
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samples and the measures employed. These can be understood 
as one-way analyses of variance. The tables present the 
average cor^*elation for the cell, the standard deviation 
of the correlation, and the number of correlations on 
which the mean and standard deviation are based. The 
first of these tables for example, shows that the linkage 
between motivation and achievement is higher in the older 
samples and it is quite low in the very young samples., • 
in fact in some cases negative. Relatively simple tabula- 
tions introduce gradually the idea of the comparisons 
across the independent variables. 

The regression control results, however, offer a 
much more parsimonious accounting for the significant 
trends in the data. Experimenting with various^ forms 
of the regression equation makes it possible to find that 
smaller sets of variables account for just about as 
much variance as the entire 25 variables that first entered 
the equation. Moreover, it can be concluded from these 
regressions with some degree of confidence that significant 
variables in the regression are controlled for ojie another, 
and even if the variables excluded from the equation were 
to be entered those that are in the final equation would 
still be significant. . The regression yields a parsimonious 
set of potent, unique deteminants of the relationship 
between motivation and educational outcomes. 

It is also possible to exclude unusual studies such 

303 ^ 



26 



as those with the two outlying observations that were 
mentioned above; In this case, excluding the .outlying 
studies made very little difference in the regression 
weights. On the other hand the very large sample size 
from the Coleman report suggested that the larger\the 
sample, the smaller the correlation. However, omitting 
the Coleman report in the analysis suggests that this. • " 
trend is not consistant in the other studies. Therefor^, 
it is attributable only to the Coleman study because of 
its large magnitude. 

The results further suggest that one of the complicating •: 
but significant results is attributable to the. accidents ' 
of only one cr two studies, particularly a, large study that 
contains many correlations in mathematics achievement./ 
Uguroglu expresses skepticism that these result^ would nec 
essarily be confiinned by other studies 

On page 15, the analyses suggest that more\i:eliable 

,' ' ' 

measures paradoxically are less closely associiated with achieve 
ment than more reliable measures which may also be uncovered 
in other meta^-analyses . This strange finding probably 
stems from the tendency of more internally consistent tests 
to have narrowed factorial content; higher interr^al consistency 
yields lower external consistency, that is, correlations 
with external criteria. Narrowing the scope of predi^ion 
instrument diminishes the relationship of the very crit^i;ia 
it is intended to predict. 
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Conclusions 

In ruminating about several research syntheses by ■ 
others as well as our own experience it appears to me that 
the techniques of Gage, Glass, Light, Smith, and others 
will accelerate progress in educational research. Our most 
precious resource in formulating educational policy is the 
true experiment with random assignment to conditions in 
natural settings of learning, but these are comparatively 
rare in educational research. ^Nevertheless , we are able 
to draw on areas of research that have employed correlational 
or quasi-experimental designs. We cannot conclude from 
the correlational relationships established from these that 
certain production factors actually cause achievement but 
if they are supported by plausibility as well as empirical 
conformation they have to be suspected as possible causes 
just as the linkage between cigarette smoking and lung cancer 
should suggest caustion about smoking. Thus tabulation 
and analysis of correlational relationships, if nothing 
else, can produce inventories of causes and effects that 
ought to be taken into consideration in future work. The 
strongest correlates suggest those factors that ought to 
be investigated with experiments. 

If in certain situations experiments cannot be done, 
then the investigator is obliged to fall back on correlational 
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studies. Correlational studies that only take into con- 
sideration one or two possible causal determinants should 
be far less convincing than those studies that include 
the complete set of consistent correlates of outcome measures . 

o 

It seems clear to us even at this preliminary stage 
of the study of ' educational productivity that the following 
plausible, consistent correlates of outcomes need to be 
considered: student ability (including prior achievement) 
and motivation, the quality and quantity of instruction, 
the home, school, and peer environment. Including these " 
factors, even in experiments, can prove valuable because 
these factors are consistent, ■ potent covariates. By including 
them in regression equations one can get a more precise 
estimate of the weight of the factor of interest, for example, 
quality of instruction controlled for^ all . the othe'r factors . 
By having a consistent model that in eludes :most or all of 
these factois in subsequent research, the replicability of 
a more fully specified equation rather than simply the re- 
lationship of one independent variable to one dependent 
variable can be more solidly established. 

It is even mo-re important to include these factors 
in correlational studies because they do not randomly 
assign the chief variable for investigation. It is well 
known, for example, that the home environment, that is, 
the intensity and amount of educationally-stimulating interaction 
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between the parent and the child is a potent correlate of 
achievement <ind indeed with achievement gain; so that it 
would be important to include this variable in research 
cn quality of instruc.-tlon. Children who are receiving higher 
quality of instruction may also have more stimulating home 
enviromrent. Children who are stimulated at home can in 

4 

fact evoke higher quality bf teaching'' in their classroom. 
Individual children caft demand the sort of attention from 
the teacher and also children in schools in stimulating 
neighborhoods can evoke better teaching on the part of the 
faculty. This suggests causation from the home environment 
to both achievement and to quality of teaching. 

One way of investigating these affects is to then include 
all possible causes in the equation. Those that survive 
screening regression techniques and make a unique contribu- 
tion to the explanation of educational outcomes can have 
greater creditability . However, advanced techniques of 
econometric analysis such as two stage, least squares re- 
gressions are even more powerful in sorting out reverse 
cause and third cause phenomena in educational data sets, 
particularly in panel data in which multiple units are measured 
on multiple occasions. 

We may be at square one with respect to what needs 
to be done to develop an equation *for estimating educational 
productivity, but the research synthesis of prior literature 



307 



30 



to develop an inventory of possible causes and effects will 
be a major step forward. Subsequent research which includes 
the potent constructs and sub-constructs which are identified 
in the research synthesis can take into consideration a 
more complete set of possible causes.. This kind of research 
can rapidly accelerate the accumlation of knowledge about 
the causes of educational achievement and other outcomes 
as well as develop a more adequate scientific and practical 
theory of educational productivity. 
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UlCC-NSF META-ANALYSIS PROJECT 
CODE BOOK 

Barbara K. Krenier and F. David Uoulanger 

Coding Schemes for 
•Individual Study Characteristics and 
Meta - Analysis Statistics 

The purpose of a coding scheme is to provide a quantitative, computer- 
retrivable summarization of the key characteristics of each comparison and 
each correlation in each study included in the meta-analysis. Since a 
single comparison or single correlation is the unit around which the coding 
scheme is constructed, there will be often be several code sheets for one 
study. 

Each coding scheme has three parts: 

(1) General Characteristics of the study 

(2) Specific Characteristics of the Construct under Study 

(3) Methodological Characteristics and Meta-Ana lys i s Statistics 
The form of the General Characteristics section is identical in each 

coding scheme. Among other things, it includes the identification of the 
dependent variable to which the effect-size or correlation reported at the 
end of the coding scheme is related. 

The Specific Characteristics of the Construct section has eight different 
forms, -one corresponding to each construct considered in the mena-anal ys i s pro- 
ject, namely: Matu rat i on , Ab i 1 i ty , Age or Developmental Level, Quantity 
of Instruction, Quality of Instruction, Home Envi rontr.rnt , Peer Environment, 
and Classroom Environment. A given comparison or correlation extracted, from 
a given study will be coded according to the independent or predictor variable 
into one of these constructs. Most studies will report variables relevant th 

only one construct. 

The form of the Methodological Characteristics section is, like the De- 
scriptive Characteristics section, identical in each coding scheme. The 
methodological flaws recorded here will form one criterion for the selection 
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of studies to be included in various parts of the later statistical analysis. 
For example, a comparison of outcomes between true and quasi^-experiments in 
a particular construct might be of interest; or i t may be desirable to exclude 
all studies with certain flaws. 

The last entry in the Methodological section is the correlation or 
effect-size that relates the independent or predictor variable in the 
Specific Characteristics section to the dependent variable in the General 
Characteristics section. As noted earlier, one study may be coded in near 
identical manner on several code sheets with only the two variables , and 
the correlation or effect-size differing among sheets. 

Code sheets for each construct are attached at the end of the Code 

Book, 
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Section I: General Characteristics of the Study 



Section III: Methodological Characteristics and 
Meta-Analysis Statistics 



(each code sheet will contain each of these sections) 
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I. Gener^Ll Study Characteristics and Dependent Variaole 



COLS . Study Identification 

1- 4 Sheet Number (four digits) 

5-16 , Author/ last name, comma, additional last names 

17-18 Year of study (last two digits) 

19-21 Number of study (three digits) 

22 Country of origin 



1 = U.S. and Canada 

2 = Britain 

3 = Australia 

4 = Other English-speaking countries 

(e.g., as in Africa) 

5 = Non-English speaking countries 

Source of Reference 
23 ^ 1 = refereed journal 

2 = ERIC (not dissertation) 

3 = dissertation or thesis aibstract 

4 = unpublished research report 



312 



- 5 - 



COLS . Science Learning Outcomes 

Cognitive Achievement, General e.g., 
Standarized Achievement Test or any 
test with some mix of cognitive fact, 
concept, process, logical operation. 

Factual i.e., identification or re- 
call of specific information pre-"" 
viously learned. 

Conceptual i.e., generalization of a 
concept to a new situation. Not 
factual. Not identified by the author 
as process or logical operation. 

04 = Process i.e., identified by the author 
as process outcomes and, on inspection, 
not factual categoary. 

05 = Logical operations in Piaget's theory 
i.e., identified as lol^ical operations 
and, on inspection, not factual category. 

06 = Attitudes and interests toward science, 
scientists, science careers, science 
instruction. 

• 07 = Critical thinking or cr.eative applications. 
Identified by the author as critical or 
creative thinking and, on inspection, 
not factual category. 

08 = Lab skills or performance test. 



24-25 01 = 



02 = 
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Dependent Measure 



1 



General 



2 



Discipline specific 



3 = Curriculum or course specific 



1 = locally developed test 

2 = published test 



Reliability of Outcome Measure (leave blank where 
not given) 

internal consistency (enter value) 

interobserver reliability (enter value) 

stability - test-retekt (enter value) 

alternate - forms (enter value) 

1 = adequate consideration of outcome measure 
validity. (Does the dependent measure 
represent a reasonable approximation of 
the outcome variable under consideration, 
without "teaching to the test"?) 



2 



inadequate consideration of outcome measure 
validity 



Grr^de Level of Sub jects 

(enter ''median" qrade if more than ono considered 
lower grade of tw *) 



00 




Kindergarten/preschool 


01 




. oi. due 




02 






2 


03 






O 


04 




Grade 


4 


05 




Grade 


5 


06 




Grade 


6 


07 




Grade 


7 


08 




Grade 


8 


09 




Grade 


9 


10 




Grade 


10 


11 




Grade 


11 


12 




Grade 


12 


13 




College of Adult 



Sex of Subjects 

1 = male 

2 = female 

3 =. mired sex sample 
Ethnicity of Subjects 



1 = 


Black 


2 = 


White 


3 = 


Latino 


4 = 


Oriental 


5 = 


Mixed ethnic sample 


6 = 


Other ethnic, including foreign studies 


7 = 


not specified 



Academic achievement level of subjects and/or 
academic aptitude (IQ). Assume medium unless 
otherwise specified. 

1 = high 

2 = medium (as specified by verbal statement or 

90-110 on intelligence measure) 
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Subjects * SES 

1 = poor, disadvantaged ' 

2 = middle class (including working, and lower 

middle class) 

3 = upper middle class 

4 = upper class 

5 = mixed SES sample> 

6 = not specified 

Ccnmun it y- type 

1 = urban 

2 = suburban 

3 = rural 

4 = mixed sample with regard to community 

5 = not specified 

Disciplinary Focus of the 

01 = Biology 

02 = Chemistry 

03 = Physics 

04 = General Science 

05 =. Earth Science 
05 = Life Science 

07 = Physical Science 

08 = Integrated or Unified 

09 = Environmental Science 

10 = Behavioral Science 

Curricular focus of Study 

1 = Nationally funded curriculum project (BSCS, 

HPP, ISCS, S-APA, ESCP, etc.) 

2 = Conventional, traditional, locally developed, 

unspecified. 

Consideration of Production Factors in Study 
Classroom environment 

1 = omitted 

2 = measured and employed in analysis (includes 

use of measure in stratification, blocking, 
covariation) 

3 = exemplary 



Study 



Science 




Ability 

1* = omitted 

2 = measured and employed in analysis (includes 

use of measure in strati f icat ion ^ blocking r 
covariation) 

3 = exemplary - 

Motivation 

1 = omitted 

2 = measured and employed in analysis (includes 

use of measure in stratification , blocking, 
covariation) 

3 = exemplary 

Quality of Instruction 

1 = omitted 

2 = measured and employed in analysis (includes. 

use of measure in stratification , blocking, 
covariation) 

3 = exemplary 

' Quantity of Instruction 

1 = omitted 

2 = measured and employed in analysis (includes 

use of measure in stratification, blocking, 
covariation) - 

3 = exemplary 
Home 

1 = omitted 

2 = measured and employed in analysis (includes 

use of measure in stratification, blocking, 
covariation) 

3 = exemplary 
Peer 

1 = omitted 

2 = measured and employed^ in analysis (includes 

use of measure in stratification, blocking, 
covariation) 



3 = exemplary 
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Age/Developmental Level 

1 - omitted 

2 = measured and employed in analysis (include 

use of measure in stratification, blocking 
covariation) 

3 = exemplary 
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XTT' Methodological Characteristics and Meta- Analysis Statistics 



COLS. 
1- 4 
5 



10 



Sheet Number (Qiter) 

Sample selection • 

1 = simple or stratified random 

2 = purposive (a priori) sample (Extreme or specialized 

group) 

/ . - 

3 = matching 

4 = convenience or ill-specified sample 

Unit of analysis 

1 = individual 

2 = group 

Study design (Campbell &. Stanley) 

1 = correlational 

2 = quasi - experimental 

3 = true experimental (random assignment) 

Reliability of treatment implementation 

1 = low; treatment and implementation poorly des- 

cribed and documented 

2 = adequate; treatment and implementation clearly 

described and documented 

3 = high; treatment described or documented with 

observational checks on implementation 

Statistical power 

1 = in^adequate 

2 = adequate, i.e., 6 or more classes; lou or more 

individuals total in 2 comparison groups or in 
correlational group 

Error rate (Given the number of comparisons or correlations, 
is the p lovel sufficiently low to assure a less than .05 
chance occurance of this relationship?) 

1 = inadequate p level 

2 = adequate, i.e., p less than .05 



/ 



ERLC 



31.9 



Maturation (have f:actor;s within units rather than the 
treatment brought abou,t the differences oi^served?) 

1 = probable threat 

2 = adequately minir(iized 

3 = information not provided 

History (Have external factors in the environment rather 
than the treatment/ brought about the differences observed?) 

1 = probable threat 

2 = adequately minimized 

3 = information not provided 

Selection/ Bias (Do pre-existing differences among the groups 
account for later obseirved differences?) 

/ 1 ■ 

1 = probable threat 

2 - adequately minimized 

3 = infofmation not provided 

Contamination, Compensation, Differential Incentive 
(Do untreated control groups work harder, work less^ 
or somehow gain benefits or lose incentive due .to in-r 
fluences from treated groups or teachers?) 

1 = provable threat 

2 = adequately minimized 

3 = information not provided 

Mortality (Do different dropout rates account for 
observed differences?) 

1 = proBable threat 

2 =^ adequately minimized 

3 = information not provided 

Generalizability (Can results be generalized to other 
times, units, or settings with similar demographic 
characteristics?) 

1 = probable threat 

2 = adequately minimized 

3 = information not provided 

Correlation; if positive, leave sign space blank, if 
negative, write minus sign (-) in sign space. 
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Type of Correlation 

1 = partial 

2 = part 

3 = zero-order 

Effect size 

Enter 99.999 if not computable 

Enter effect size following formular or other approach 
recommended by Glass 

Xexp - Xcon 
SDcon 

Direction of effect size 

1 = significantly (p< .05) favors control 

2 = favors control, not significantly 

3 = favors experimental treatment group, not 

significantly 

'~ - - -. i- 

4 = significantly, Jp^ .05) favors experimental 

treatment " 

Level of significance or p value 
Enter .999 if ' 'it specified 

Enter p-value if available^ otherwise enter alpha 
level met 

Sample size enter, right justify (for effect size, 
sum of sample in groups compared) 
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- li» - 

,/ ... 
/ ■ 

/ 

/ .Eight Versions of Section II: Specific Characteristics 

of Constructs under Study 



(each code sheet will contain only one version corresjxjnding to one construct) 
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II. Home Construct 



Ijome Factors under Study ■ 

Standard socioeconomic characteristics 

01 = Parent education 

02 = Parent income 

0 3 = Parent occupation level 

04 = Housing value (of specific house or 

apartment ) 

05 = Neighborhood or community SES 

Family constellation 

06 = # of children in the family 

07 = Adult-child ratio in: the home 

08 = Birth order of student 

09 = Single parent homes 

10 = Crowding ratio (# of family members, 

rooms in house or apartment) 

11 = # of persons living in the home 

12.= Presence of science-related equipment and 
documents in the home ^ . 

13 = Gender differences: sex-role stereo-typing 

14 = Ethnic comparisons (within societies) ex- 

clude cross -nation 

15 = -Parental aspirations . for child and * attitudes 

to education 

16 = Parent involvement in the school and the 

child • s schoolwork (Keeves ) . 

17 = Generalized SES - Judgment criteria may not 

be specified 

18 = Multiple index SES 
Pres e nce qf_ !bme Va riable in^ Stud y 

1 = independent variable 

2 = mediating or covariate variable 

Method of^ Collecting HDme^ Inf ormat ion 

1 = Parents* questionnaire 

2 = Students ' questionnaire 

3 = Ibme interview with parents 

4 = Parent interviews outside the home (e.g., 

Lhc school ) 
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Method of Collocting Ibme Information (continued) 

5 = School records, archives 

6 = Not reported 

7 = Teacher or other staff rating of EES 

8 = Teacher or other ''staff rating of home 

support and stimulation 

9 = Multiple methods used 
Validity of R pme Measur e 

59 1 = adequate 

2 = inadequate 

3 = exemplary 




Peer Var iable s unde r Study 
Peer grouping 

1 = Within clas.ses (during instruction, e.g., 

individuals vs, group work) 

2 = tetween classes (tracking) 

3 = 9::hool activities (athletics, extra- 

curricular ) 

4 = Outside of school (e.g. , sociological 

characteristics of peer groupings ) 

Participation/Interaction 

5 = Degree of Participation/Interaction 

6 = Quality or style of participation/interaction 

Subje ct Placement in_ Peer Groupings 

1 = Assigned . • ' 

2 = Choice within requirement 

3 = Free choice,, 

4 = Intact groups 

9 = variable under study 

Enter categories compared 

^ses for Plac emen t or Criteria for Gr oup Membersh ip 

1 = Ability 

2 = Interest ^^^^ ^ 

3 = Psychological Characteristics (creativity, 

field dependence, independence) 

4 = Peer acceptance 

5 = Course or curricular enrollment 

6 = Teacher judgment 

7 = Arbitrary or unclear 

8 = Combination of the above 

9 = Uncategorized 

Position of_ Pcer^ Variable in^ ^^udy 

1 = independent variable. ' 

2 = mediating or covariate 
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COLS. 



Type of Peer 




easure 
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1 = Observer r 



port 



2 = Self-report 

T 

3 = Standardized scale or instrument 

4 = Teacher rep^prt 

5 = Combination\^of the above 

Validit y of Peer Measure 
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1 = 



adequate 
inadequate 



2 = 
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II. Motivation Construct 



COLS , Motivation Va^iatb^l e Under Study 

55-56 01 = acadomic, n-achievement 

02 = persistence 

03 = intrinsic motivation 

04 == locus of control 

05 = self-concept: (personal) 

06 == continuing motivation, interest in academic 

study outside of school 

07 = feedback/academic evaluation 

08 == test anxiety 

09 = attribution of causality 

10 = perceived ability/success 

11 = risk-taking 

12 = academic self-concept or concept of ability 
Position of Motiva^ ou Variable in Study 



57 1 = independent 

2 = mediating or covariate 

Motivation Measure 



58 1 = standardized scale or instrument 

2 = lo al instrument or scoring technique 

3 = observations 

4 = other 

R elia bility oi^ Motivation Measure 

59-60 (enter reliability value; whether reported or 

estimated ) 

Motivation Lev el of_ S ubjec ts 

61 . 1 ~ low motivation sample 

2 = high motivation sample 

3 = mixed sample; high vs. low motivation group 

4 = no control for motivation, in sample; or 

convenience sample 

5 = no information on sampling 
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COL S, * Orientat ion of^ Study 

62 1 = interventionist/experimental (focus of 

investigation is on increasing or other- 
wise controlling motivation ) 

2 = non- interventionist studies (including 

descriptive/correlational investigations ) 

Inte rventio nist Manipulation 

(leave blank if study non-interventionist in/ 
character ) . /' 

63 1 = Task, materials.- / 

2 = Teacher behavior 

3 = Classroom environment (opern vs, closed, 

co-op vs. comp., matching instruction). 

4 = Other 

Validity of Motivation Measure 



64 1 = Adequate consideration of independent 

measure validity. (Does the independent 
measure represent a reasonably approxi- 
mation of the variable under consideration) 

2 = inadequate consideration of motivation 
measure validity 
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II. Ability Construct 



COLS. 



Ability variable under study 

^ = General a bility, aptitutde (intelligence, mental maturity, 
general or subject specific aptitude, culture free measures 
of ability) 

2 = Pretested knowledge or skill specific to the particular 
treatment or criterion measure; cognitive entry behavior. 
Includes the case where the same process or achievement 
measure is given pre and post. 

^ = Past achievement (CPA, grades, general or subject area 
achievement) 

Past rate of learning (efficiency of learning, speed on 
treatment or criterion related tasks) 

5*= Cognitive style (field dependence, cognitive preference, 
work style) 

6*= Creativity or creative thinking 

7 = Verbal aptitude 

8 = Quantitative aptitude 

9 = Mechanical - Spatial reasoning 



Position of ability variable in study. If 5 or 6, list dependent 
variable in columns 24, 25 under general characteristics of the 
study. 

1 = blocking variable 

2 = covariate 

3 = independent: 

4 = mediating 

5 = covariate and dependent 

6 = independent and dependent 



57 Ability measure 

1 = standardized scale or instrument 

2 = local instrument or scoring teclmique 

3 = research instrument not yet standardized 

4 = observations, ratings 

5 = not reported 

58 Reliability of ability measure 

1 = reported in study 

2 = estimated 3*^^ 
*Citof/ory dcltitod Cor Jack of studies. 
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Reliability value, whether reported or estimated. 

Estimated or reported general ability level of subjects on 
general ability, past achievement, or past rate of learning. 

1 = low ability (below -1 SD) 

2 = below average (-1 SD to mean) 

3 = average ability (-1 SD to +1 SD) 

4 = above average (mean to +1 SD). . 

5 = high ability (above +1 SD) 

6 = information on sample insufficient to make an estimate 

Character of study 

1 - non-intervention, correlational 

2 = interventionist (quality of instruction) , ^ ^ 

' Dependent 

3 = interventionist (quantity of instruction) S variable related 



4 = intervention?.st (motivational) 

5 = interventionist (classroom environment) 

Time lapse between tc:st (predictor) and criterion. 
1 = Concurrent (less than 1 week) 
2=1 week to 4 weeks inclusive 

3 = gveater than 1 month to 6 months inclusive 

4 = ir.oice than 6 months 



treatment between 
test and criterion 
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II. Age /Developmental Level Construct 



COLS. 

55 Age/Developmental Level variable under study 

1 = Chronological age , year in school 

2 = Piaget stage ' 

3 = Piaget logical operations associated with stages 

4 = Kolberg moral stage 

5 = Kolberg moral judgments associated with moral stages 

6 = Havighurst's stages 

7 = Erickson's stages 

56 Age/Level measure 

1 = Scored imitation of content and method of presentation 

found in original source 

2 = Novel tasks, individually administered, based on the 

original theory 

3 = Group demonstration with individual responses 

4 = Group administered paper and pencil test 

57 Reliability 

1 = reported in study 

2 = estimated 

\ 

58-59 Value of reliability correlation . 

60 Method of validation 

1 = Assumed validity based on identification of 

content and method with original source 

2 = Validation by panel of expert judges 

3 = Correlation with results of method advocated by original 

source (e.-g. Piaget) 

4 = Construct validity (includes 1, 2 & 3) 
61-62 Value of validity correlations, if any 
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Position of age/level variable in study. If 5 or 6, list 
dependent under general characteristics of the study. 



1 



blockinc/^ variable 



2 



covariate 



3 



independent variable 



4 = mediating variable 

5 = covariate and dependent 

6 = independent and dependent 

Reported developmental level of subjects 

1 = concrete operational 

2 = formal operational 

3 =/fuli range from concrete to formal 



4 




preconventional moral stage 


5 




conventional moral stage 


6 




post-conventional moral stage 


7 




4 and 5 


8 




5 and 6 


9 




4, 5 and 6 


10 




Not reported 


11 







Character of study 



1 


= non-interventionist „ correlational 


2 


= intprventionist 


(quality of instruction) 


3 


= interventionist 


{an;=»ntitv of instruction) 


4 


= interventionist 


(motivation) 


5 


= interventionist 


(classroom environment) 



Years difference between groups compared. (Needed if effect 
size/year is to be calculated) when computing effect size, older 
group is "experin.ontal group." if same age, more formal group 
is "experimental group." 
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I I . Quality of Instruction Construct 



Cols . 

55 Experimental treatment applied to: 

1= individuals 

2= sma 1 1 group (2-6) 

3= class size group (/"^O) 

k= large group (more than 90) 

56 Control treatment applied to: 

1= comparable size group 

2= different size (more than +5) 

57 Experimental pa rt i c i pat i on ( i f course is elective, 

participation in any part is considered elective 
un less otherwi se specified) 

1= elective (eg, high school physics and chemistry) 
2= required (eg, most junior high science) 
3= both elective and required options or 
unknown (eg, high school biology) 

58 Control participation 

1= comparable to experimental 
2= different from experimental 

59 Experimental group teachers 

1= regular' teacher- (s) 
2= special teacher (s) 
3= materials only under study 



(no live teacher as part of independent variable) 



60 Control group teacher 



1= comparable to experimental 
2= di f ferent 

3= materials only under study 
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Focus of instructional treatment. Primarily: 

1= non- laboratory (students not working 

wi th apparatus ) 
2= both laboratory and non- laboratory instruction 
3= laboratory only 
4='^other 

Quality of instruction component under study: 

1= curriculum, course or other global comparison 
2= teacher behavior and materials 

(more controlled, better defined, usually 
shorter in durat i on than number 1 ) 
3= teacher behavior only 

(basically same materials in all treatments) 
k= materials on ly 

(no teacher actively involved, eg. CAI, TV 
A-T, programmed instruction) 



Quality of instruction variable under study 
(i ncomplete 1 i sti ng) ; 

Prei nstruct iona 1 strategy 

1= advance organizer vs none or placebo 
2= statement of objectives ^ / 

set induction 

Directness of Instruction 

15= direct (experimental) vs non-di rect (contro l) 
instruct ion ; teacher di rected (exp) vs 
student self-directed (cont) instruction. 
In cor re 1 at i on a 1 s tud i es , th i s . i s "teacher- 
di recti ves'* (explaining, lecturing, di recti ng) 

16= Indi rect/di rect rat ion (Fl anders ) 

Lower ID group is experi mental group, 
"teacher-di rectness'' is degree of not 
using discussion. 

Instruction in processes and logical operations 

25= training in processes of science 
2d=j8;t raining in logical operations 
(reasoning patterns) a la Piaget 

Strucure in verbal content of materials 

24= kinetic structure(Hi gh= experimental group) 
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Inductive vs deductive strategies 

20= i nduct i ve (control ) vs deductive (experimental); 
Inquiry (cont) vs expository (exp) 

21= logical i,e, inductive and/or deductive vs 
randQox sequencing 

22= expository (lecture-discussion) vs 
laboratory (control ) 

70= inductive, inquiry based curri cu lum 

(many curriculum projects of 60*s) vs traditional 
curriculum. 

Method of obtaining observations/measures of variable 
under study : 

1= self repbrt 

2= expert rating 

3= student rating 

k= expert co;r'- • 

5= specializes ^v*ng without classroom verification 

6= predetermined in structure of materials 

7= cannot be determined 

8= both 5 and 6 



Interobserver agreement 
1= simple percent 

2= other method (eg. Scott's coefficient) 
3= not reported 

Enter percent agreement value (leave blank if not reported) 

Length of treatment 

1= less than or equal to one hour 

2= greater than one, less than ten hours 

3= 10 to 50 hours 

^= a course (IG weeks or more, about one hour per weekday) 
5= cannot, be determined or estimated 

Covariates partialed out of effect size 
1= none 

2= abi 1 i ty ( IQ, apti tude) 

3= pretested knowledge and/or achievement 

^= sociological variables (SES» cl ass room en vi ronmen t ,etc) 

5= psychological var i ab les (mot i vat i on , personality) 

6= 2 and 3 

7= three or more of the above 
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Control group access to treatment content 
]= none 

2= not conpai;able 

3= comparable ( approximately equivalent) 
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I I . Quantity of Instruction Construct 




Instructional group one values 
Minutes per session 
Sessions per week 
Number of weeks 
Number of yeajL's 

Reported estimate or observed percent of time on task 

Instructional group two mean values 
Minutes per session 
Sessions per week 
Number of weeks 
Number of years 

Reported estimate or observed percent of time on task 



Quantity of Instruction Variable Under Study 

1 - minutes per session 

2 = number of minutes 

3 = sessions per week 

4 = number of sessions 

5 = number of weeks 

6 = number of years 

Position of quantity variable in study. If 4 or 5/ list 
dependent variable in column under general characteristics 
of the study 

1 = covariaite 

2 = independent 

3 = mediating 

4 = covariate and dependent 

5 = independent and dependent 

Method of measuring quantity 

1 = student self-report 

2 = teacher report 

3 = trained observer 

C^i-:»iacter of study 

1 = non-interventionist , correlational 

2 = interventionist, experimental 3 c/ 7 ' 



Soci a 1 



Envi ronment 



of the ClasGroom 



Env i ronment Measure ji^, 
4) 

1= Learning Environment Inventory (LEI) 
2= Modified LEi 
3= My Class 

k= Classroom Environment Scale (CES) 

5= Learning Environment Inventory (i960 version) 

Prior Achievemeni: Controls (by subject area ) 

1= General Science 

2= Li fe Sci ence 

3=* Physcial Science 

^= Mathematics 

5= Social Science 

6= Human it ies 

7~ General Achievement 

8== Attitude toward subject matter 

9= Miscel laneous 

Learning Envi ronment Inventory Scale 

Cohes i veness 
Fri ct ion . 
C 1 iqueness 
Satisfaction 
Speed 

Difficulty . 
Apathy 
Favoritism 
Forma 1 i ty 
Goal Di rect ion " ^ 
Democracy 
Disorganization 
Di vers ity 
Environment 
Competition 

Learning Outcome Domain 

1= Cogni t i ve 

2= Attitudinal 

3= Behavioral . • 
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COLS. 



ERIC 



Learning Outcome Content Area 



60 1= General Science W 

2= Li fe Sci ence ' ! 
3= Physical Science 

^=..Mathcinat ics j . . 

i)= Soci j1 Sci ence ! 

6= Humanities ! 

7= General Achievement 

8= Attitude toward Subject Matiter 

9= Miscel larieous 

Number of Classes in Study 
61-63 (enter number of classes) 

Uni t of Analysis 

6^ 1= Individual Student 

2= Subgroups of .Students 

3=- C lasses ^ , 

k= Schools 

Rel isbi 1 i ty of Social Environment Measure ' 
65-66 (enter relaibil ity v^lue; whether reported or <z?5 f ' mate j 

67-68 Reliability of Learning Outcome Measure ' 

(enter reliability value; whether reported or ^estimated) 
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